51 research outputs found

    The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants

    Full text link
    Reasoning is a crucial part of natural language argumentation. To comprehend an argument, one must analyze its warrant, which explains why its claim follows from its premises. As arguments are highly contextualized, warrants are usually presupposed and left implicit. Thus, the comprehension does not only require language understanding and logic skills, but also depends on common sense. In this paper we develop a methodology for reconstructing warrants systematically. We operationalize it in a scalable crowdsourcing process, resulting in a freely licensed dataset with warrants for 2k authentic arguments from news comments. On this basis, we present a new challenging task, the argument reasoning comprehension task. Given an argument with a claim and a premise, the goal is to choose the correct implicit warrant from two options. Both warrants are plausible and lexically close, but lead to contradicting claims. A solution to this task will define a substantial step towards automatic warrant reconstruction. However, experiments with several neural attention and language models reveal that current approaches do not suffice.Comment: Accepted as NAACL 2018 Long Paper; see details on the front pag

    The Illinois Studies in Inquiry Training: A Critical Review

    Get PDF

    Generalized Hidden Filter Markov Models Applied to Speaker Recognition

    Get PDF
    Classification of time series has wide Air Force, DoD and commercial interest, from automatic target recognition systems on munitions to recognition of speakers in diverse environments. The ability to effectively model the temporal information contained in a sequence is of paramount importance. Toward this goal, this research develops theoretical extensions to a class of stochastic models and demonstrates their effectiveness on the problem of text-independent (language constrained) speaker recognition. Specifically within the hidden Markov model architecture, additional constraints are implemented which better incorporate observation correlations and context, where standard approaches fail. Two methods of modeling correlations are developed, and their mathematical properties of convergence and reestimation are analyzed. These differ in modeling correlation present in the time samples and those present in the processed features, such as Mel frequency cepstral coefficients. The system models speaker dependent phonemes, making use of word dictionary grammars, and recognition is based on normalized log-likelihood Viterbi decoding. Both closed set identification and speaker verification using cohorts are performed on the YOHO database. YOHO is the only large scale, multiple-session, high-quality speech database for speaker authentication and contains over one hundred speakers stating combination locks. Equal error rates of 0.21% for males and 0.31% for females are demonstrated. A critical error analysis using a hypothesis test formulation provides the maximum number of errors observable while still meeting the goal error rates of 1% False Reject and 0.1% False Accept. Our system achieves this goal

    Defining and Assessing Critical Thinking: toward an automatic analysis of HiEd students’ written texts

    Get PDF
    L'obiettivo principale di questa tesi di dottorato ù testare, attraverso due studi empirici, l'affidabilità di un metodo volto a valutare automaticamente le manifestazioni del Pensiero Critico (CT) nei testi scritti da studenti universitari. Gli studi empirici si sono basati su una review critica della letteratura volta a proporre una nuova classificazione per sistematizzare le diverse definizioni di CT e i relativi approcci teorici. La review esamina anche la relazione tra le diverse definizioni di CT e i relativi metodi di valutazione. Dai risultati emerge la necessità di concentrarsi su misure aperte per la valutazione del CT e di sviluppare strumenti automatici basati su tecniche di elaborazione del linguaggio naturale (NLP) per superare i limiti attuali delle misure aperte, come l’attendibilità e i costi di scoring. Sulla base di una rubrica sviluppata e implementata dal gruppo di ricerca del Centro di Didattica Museale – Università di Roma Tre (CDM) per la valutazione e l'analisi dei livelli di CT all'interno di risposte aperte (Poce, 2017), ù stato progettato un prototipo per la misurazione automatica di alcuni indicatori di CT. Il primo studio empirico condotto su un gruppo di 66 docenti universitari mostra livelli di affidabilità soddisfacenti della rubrica di valutazione, mentre la valutazione effettuata dal prototipo non era sufficientemente attendibile. I risultati di questa sperimentazione sono stati utilizzati per capire come e in quali condizioni il modello funziona meglio. La seconda indagine empirica era volta a capire quali indicatori del linguaggio naturale sono maggiormente associati a sei sottodimensioni del CT, valutate da esperti in saggi scritti in lingua italiana. Lo studio ha utilizzato un corpus di 103 saggi pre-post di studenti universitari di laurea magistrale che hanno frequentato il corso di "Pedagogia sperimentale e valutazione scolastica". All'interno del corso, sono state proposte due attività per stimolare il CT degli studenti: la valutazione delle risorse educative aperte (OER) (obbligatoria e online) e la progettazione delle OER (facoltativa e in modalità blended). I saggi sono stati valutati sia da valutatori esperti, considerando sei sotto-dimensioni del CT, sia da un algoritmo che misura automaticamente diversi tipi di indicatori del linguaggio naturale. Abbiamo riscontrato un'affidabilità interna positiva e un accordo tra valutatori medio-alto. I livelli di CT degli studenti sono migliorati in modo significativo nel post-test. Tre indicatori del linguaggio naturale sono 5 correlati in modo significativo con il punteggio totale di CT: la lunghezza del corpus, la complessità della sintassi e la funzione di peso tf-idf (term frequency–inverse document frequency). I risultati raccolti durante questo dottorato hanno implicazioni sia teoriche che pratiche per la ricerca e la valutazione del CT. Da un punto di vista teorico, questa tesi mostra sovrapposizioni inesplorate tra diverse tradizioni, prospettive e metodi di studio del CT. Questi punti di contatto potrebbero costituire la base per un approccio interdisciplinare e la costruzione di una comprensione condivisa di CT. I metodi di valutazione automatica possono supportare l’uso di misure aperte per la valutazione del CT, specialmente nell'insegnamento online. Possono infatti facilitare i docenti e i ricercatori nell'affrontare la crescente presenza di dati linguistici prodotti all'interno di piattaforme educative (es. Learning Management Systems). A tal fine, ù fondamentale sviluppare metodi automatici per la valutazione di grandi quantità di dati che sarebbe impossibile analizzare manualmente, fornendo agli insegnanti e ai valutatori un supporto per il monitoraggio e la valutazione delle competenze dimostrate online dagli studenti.The main goal of this PhD thesis is to test, through two empirical studies, the reliability of a method aimed at automatically assessing Critical Thinking (CT) manifestations in Higher Education students’ written texts. The empirical studies were based on a critical review aimed at proposing a new classification for systematising different CT definitions and their related theoretical approaches. The review also investigates the relationship between the different adopted CT definitions and CT assessment methods. The review highlights the need to focus on open-ended measures for CT assessment and to develop automatic tools based on Natural Language Processing (NLP) technique to overcome current limitations of open-ended measures, such as reliability and costs. Based on a rubric developed and implemented by the Center for Museum Studies – Roma Tre University (CDM) research group for the evaluation and analysis of CT levels within open-ended answers (Poce, 2017), a NLP prototype for the automatic measurement of CT indicators was designed. The first empirical study was carried out on a group of 66 university teachers. The study showed satisfactory reliability levels of the CT evaluation rubric, while the evaluation carried out by the prototype was not yet sufficiently reliable. The results were used to understand how and under what conditions the model works better. The second empirical investigation was aimed at understanding which NLP features are more associated with six CT sub-dimensions as assessed by human raters in essays written in the Italian language. The study used a corpus of 103 students’ pre-post essays who attended a Master's Degree module in “Experimental Education and School Assessment” to assess students' CT levels. Within the module, we proposed two activities to stimulate students' CT: Open Educational Resources (OERs) assessment (mandatory and online) and OERs design (optional and blended). The essays were assessed both by expert evaluators, considering six CT sub-dimensions, and by an algorithm that automatically calculates different kinds of NLP features. The study shows a positive internal reliability and a medium to high inter-coder agreement in expert evaluation. Students' CT levels improved significantly in the post-test. Three NLP indicators significantly correlate with CT total score: the Corpus Length, the Syntax Complexity, and an adapted measure of Term Frequency- Inverse Document Frequency. The results collected during this PhD have both theoretical and practical implications for CT research and assessment. From a theoretical perspective, this thesis shows unexplored similarities among different CT traditions, perspectives, and study methods. These similarities could be exploited to open up an interdisciplinary dialogue among experts and build up a shared understanding of CT. Automatic assessment methods can enhance the use of open-ended measures for CT assessment, especially in online teaching. Indeed, they can support teachers and researchers to deal with the growing presence of linguistic data produced within educational 4 platforms. To this end, it is pivotal to develop automatic methods for the evaluation of large amounts of data which would be impossible to analyse manually, providing teachers an

    Contribution au pronostic de défaillances guidé par des données

    Get PDF
    Ce mĂ©moire d’Habilitation Ă  Diriger des Recherche (HDR) prĂ©sente, dans la premiĂšre partie, une synthĂšse de mes travaux d’enseignement et de recherche rĂ©alisĂ©s au sein de l’École Nationale SupĂ©rieure de MĂ©canique et des Microtechniques (ENSMM) et de l’Institut FEMTO-ST. Ces travaux s’inscrivent dans la thĂ©matique du PHM (Prognostics and Health Management) et concernent le dĂ©veloppement d’une approche intĂ©grĂ©e de pronostic de dĂ©faillances guidĂ©e par des donnĂ©es. L’approche proposĂ©e repose sur l’acquisition de donnĂ©es reprĂ©sentatives des dĂ©gradations de systĂšmes physiques, l’extraction de caractĂ©ristiques pertinentes et la construction d’indicateurs de santĂ©, la modĂ©lisation des dĂ©gradations, l’évaluation de l’état de santĂ© et la prĂ©diction de durĂ©es de fonctionnement avant dĂ©faillances (RUL : Remaining Useful Life). Elle fait appel Ă  deux familles d’outils : d’un cĂŽtĂ© des outils probabilistes/stochastiques, tels que les rĂ©seaux BayĂ©siens dynamiques, et de l’autre cĂŽtĂ© les modĂšles de rĂ©gression non linĂ©aires, notamment les machines Ă  vecteurs de support pour la rĂ©gression. La seconde partie du mĂ©moire prĂ©sente le projet de recherche autour du PHM de systĂšmes complexes et de MEMS (Micro-Electro-Mechanical Systems), avec une orientation vers l’approche de pronostic hybride en combinant l’approche guidĂ©e par des donnĂ©es et l’approche basĂ©e sur des modĂšles physiques.This Habilitation manuscript presents, in the first part, a synthesis of my teaching and research works achieved at the National Institute of Mechanics and Microtechnologies (ENSMM) and at FEMTO-ST Institute. These works are within the topic of Prognostics and Health Management (PHM) and concern the development of an integrated data-driven failure prognostic approach. The proposed approach relies on acquisition of data which are representative of systems degradations, extraction of relevant features and construction of health indicators, degradation modeling, health assessment and Remaining Useful Life (RUL) prediction. This approach uses two groups of tools: probabilistic/stochastic tools, such as dynamic Bayesian networks, from one hand, and nonlinear regression models such as support vector machine for regression and Gaussian process regression, from the other hand. The second part of the manuscript presents the research project related to PHM of complex systems and MEMS (Micro-Electro-Mechanical Systems), with an orientation towards a hybrid prognostic approach by considering both model-based and data-driven approaches

    Cultural Consultations in Criminal Forensic Psychology: A Thematic Analysis of the Literature

    Get PDF
    The importance of culture as a reference point in clinical practices such as forensic psychology has been considerably valued yet poorly understood, especially in an age where precision and sophistication outlast cultural authenticity and patient-clinician relationship. This paper looks at the gaps and inconsistencies that exist in current forensic psychology research. The topic is introduced by delving into the understanding of the phenomenon of culture and its influences on our everyday conditioning. Aspects such as language, biological development, traditions, rituals, and narratives are emphasized as potent tools that drive individuals to create and mold culture according to needs and requirements of the moment. These elements are then used for signifying the inherent ways in which culture can result in both despair as well as positive enforcement, thereby being a powerful element of consideration in forensic assessment practice. The essential concept explored in this paper involves the clinicians’ perspectives on the meaning of cultural values, norms and beliefs that shape the behavior of the patient. Through this exploration I attempted to understand how the clinical practice of forensic psychology can be made more authentic and less cold and calculated by consideration of cultural malleability. By using thematic analysis, I reviewed a large collection of the relevant literature in an attempt to understand the core concepts that drive clinicians in their cultural considerations. I emphasized attention to the malleable nature of culture and the intricate ways in which culture is related to biological, psychological, anthropological, and legal aspects of forensic psychology. The conclusions of the paper include specific considerations for creating a well-structured cultural consultation model, which emphasizes attention to aspects like clinical approach, patient’s family of origin, current community, as well as biological and psychological conditions of the patient and the patient’s cultural perspective on those conditions

    Better predictions when models are wrong or underspecified

    Get PDF
    Many statistical methods rely on models of reality in order to learn from data and to make predictions about future data. By necessity, these models usually do not match reality exactly, but are either wrong (none of the hypotheses in the model provides an accurate description of reality) or underspecified (the hypotheses in the model describe only part of the data). In this thesis, we discuss three scenarios involving models that are wrong or underspecified. In each case, we find that standard statistical methods may fail, sometimes dramatically, and present different methods that continue to perform well even if the models are wrong or underspecified. The first two of these scenarios involve regression problems and investigate AIC (Akaike's Information Criterion) and Bayesian statistics. The third scenario has the famous Monty Hall problem as a special case, and considers the question how we can update our belief about an unknown outcome given new evidence when the precise relation between outcome and evidence is unknown.UBL - phd migration 201

    Turvalisel ĂŒhisarvutusel pĂ”hinev privaatsust sĂ€ilitav statistiline analĂŒĂŒs

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsioone.Kaasaegses ĂŒhiskonnas luuakse inimese kohta digitaalne kirje kohe pĂ€rast tema sĂŒndi. Sellest hetkest alates jĂ€lgitakse tema kĂ€itumist ning kogutakse andmeid erinevate eluvaldkondade kohta. Kui kasutate poes kliendikaarti, kĂ€ite arsti juures, tĂ€idate maksudeklaratsiooni vĂ”i liigute lihtsalt ringi mobiiltelefoni taskus kandes, koguvad ning salvestavad firmad ja riigiasutused teie tundlikke andmeid. Vahel anname selliseks jĂ€litustegevuseks vabatahtlikult loa, et saada mingit kasu. NĂ€iteks vĂ”ime saada soodustust, kui kasutame kliendikaarti. Teinekord on meil vaja teha keeruline otsus, kas loobuda vĂ”imalusest teha mobiiltelefonikĂ”nesid vĂ”i lubada enda jĂ€lgimine mobiilimastide kaudu edastatava info abil. Riigiasutused haldavad infot meie tervise, hariduse ja sissetulekute kohta, et meid paremini ravida, harida ja meilt makse koguda. Me loodame, et meie andmeid kasutatakse mĂ”istlikult, aga samas eeldame, et meie privaatsus on tagatud. KĂ€esolev töö uurib, kuidas teostada statistilist analĂŒĂŒsi nii, et tagada ĂŒksikisiku privaatsus. Selle eesmĂ€rgi saavutamiseks kasutame turvalist ĂŒhisarvutust. See krĂŒptograafiline meetod lubab analĂŒĂŒsida andmeid nii, et ĂŒksikuid vÀÀrtuseid ei ole kunagi vĂ”imalik nĂ€ha. Hoolimata sellest, et turvalise ĂŒhisarvutuse kasutamine on aeganĂ”udev protsess, nĂ€itame, et see on piisavalt kiire ja seda on vĂ”imalik kasutada isegi vĂ€ga suurte andmemahtude puhul. Me oleme teinud vĂ”imalikuks populaarseimate statistilise analĂŒĂŒsi meetodite kasutamise turvalise ĂŒhisarvutuse kontekstis. Me tutvustame privaatsust sĂ€ilitavat statistilise analĂŒĂŒsi tööriista Rmind, mis sisaldab kĂ”iki töö kĂ€igus loodud funktsioone. Rmind sarnaneb tööriistadele, millega statistikud on harjunud. See lubab neil viia lĂ€bi uuringuid ilma, et nad peaksid ĂŒksikasjalikult tundma allolevaid krĂŒptograafilisi protokolle. Kasutame dissertatsioonis kirjeldatud meetodeid, et valmistada ette statistiline uuring, mis ĂŒhendab kaht Eesti riiklikku andmekogu. Uuringu eesmĂ€rk on teada saada, kas Eesti tudengid, kes töötavad ĂŒlikooliĂ”pingute ajal, lĂ”petavad nominaalajaga vĂ€iksema tĂ”enĂ€osusega kui nende Ă”pingutele keskenduvad kaaslased.In a modern society, from the moment a person is born, a digital record is created. From there on, the person’s behaviour is constantly tracked and data are collected about the different aspects of his or her life. Whether one is swiping a customer loyalty card in a store, going to the doctor, doing taxes or simply moving around with a mobile phone in one’s pocket, sensitive data are being gathered and stored by governments and companies. Sometimes, we give our permission for this kind of surveillance for some benefit. For instance, we could get a discount using a customer loyalty card. Other times we have a difficult choice – either we cannot make phone calls or our movements are tracked based on cellular data. The government tracks information about our health, education and income to cure us, educate us and collect taxes. We hope that the data are used in a meaningful way, however, we also have an expectation of privacy. This work focuses on how to perform statistical analyses in a way that preserves the privacy of the individual. To achieve this goal, we use secure multi-­‐party computation. This cryptographic technique allows data to be analysed without seeing the individual values. Even though using secure multi-­‐party computation is a time-­‐consuming process, we show that it is feasible even for large-­‐scale databases. We have developed ways for using the most popular statistical analysis methods with secure multi-­‐party computation. We introduce a privacy-­‐preserving statistical analysis tool called Rmind that contains all of our resulting implementations. Rmind is similar to tools that statistical analysts are used to. This allows them to carry out studies on the data without having to know the details of the underlying cryptographic protocols. The methods described in the thesis are used in practice to prepare for running a statistical study on large-­‐scale real-­‐life data to find out whether Estonian students who are working during university studies are less likely to graduate in nominal time
    • 

    corecore