943 research outputs found

    Identifying power relationships in dialogues

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 175-179).Understanding power relationships is an important step towards building computers that can understand human social relationships. Power relationships can arise due to dierences in the roles of the speakers, as between bosses and employees. Power can also affect the manner of communication between social equals, as between friends or acquaintances. There are numerous potential uses for an automatic system that can understand power relationships. These include: the analysis of the organizational structure of formal and ad-hoc groups, the profiling of in influential individuals within a group, or identifying aggressive or power-inappropriate language in email or other Internet media. In this thesis, we explore the problem of engineering eective power identication systems. We show methods for constructing an eective ground truth corpus for analyzing power. We focus on three areas of modeling that help in improving the prediction of power relationships. 1) Utterance Level Language Cues - patterns of language use can help distinguish the speech of leaders or followers. We show a set of eective syntactic/semantic features that best capture these linguistic manifestations of power. 2) Dialog Level Interactions - the manner of interaction between speakers can inform us about the underlying power dynamics. We use Hidden Markov Models to organize and model the information from these interaction-based cues. 3) Social conventions - speaker behavior is in influenced by their background knowledge, in particular, conventional rules of communication. We use a generative hierarchical Bayesian framework to model dialogs as mental processes; then we extend these models to include components that encode basic social conventions such as politeness. We apply our integrated system, PRISM, on the Nixon Watergate Transcripts, to demonstrate that our system can perform robustly on real world data.by Yuan Kui Shen.Ph.D

    An NLP Analysis of Health Advice Giving in the Medical Research Literature

    Get PDF
    Health advice – clinical and policy recommendations – plays a vital role in guiding medical practices and public health policies. Whether or not authors should give health advice in medical research publications is a controversial issue. The proponents of actionable research advocate for the more efficient and effective transmission of science evidence into practice. The opponents are concerned about the quality of health advice in individual research papers, especially that in observational studies. Arguments both for and against giving advice in individual studies indicate a strong need for identifying and accessing health advice, for either practical use or quality evaluation purposes. However, current information services do not support the direct retrieval of health advice. Compared to other natural language processing (NLP) applications, health advice has not been computationally modeled as a language construct either. A new information service for directly accessing health advice should be able to reduce information barriers and to provide external assessment in science communication. This dissertation work built an annotated corpus of scientific claims that distinguishes health advice according to its occurrence and strength. The study developed NLP-based prediction models to identify health advice in the PubMed literature. Using the annotated corpus and prediction models, the study answered research questions regarding the practice of advice giving in medical research literature. To test and demonstrate the potential use of the prediction model, it was used to retrieve health advice regarding the use of hydroxychloroquine (HCQ) as a treatment for COVID-19 from LitCovid, a large COVID-19 research literature database curated by the National Institutes of Health. An evaluation of sentences extracted from both abstracts and discussions showed that BERT-based pre-trained language models performed well at detecting health advice. The health advice prediction model may be combined with existing health information service systems to provide more convenient navigation of a large volume of health literature. Findings from the study also show researchers are careful not to give advice solely in abstracts. They also tend to give weaker and non-specific advice in abstracts than in discussions. In addition, the study found that health advice has appeared consistently in the abstracts of observational studies over the past 25 years. In the sample, 41.2% of the studies offered health advice in their conclusions, which is lower than earlier estimations based on analyses of much smaller samples processed manually. In the abstracts of observational studies, journals with a lower impact are more likely to give health advice than those with a higher impact, suggesting the significance of the role of journals as gatekeepers of science communication. For the communities of natural language processing, information science, and public health, this work advances knowledge of the automated recognition of health advice in scientific literature. The corpus and code developed for the study have been made publicly available to facilitate future efforts in health advice retrieval and analysis. Furthermore, this study discusses the ways in which researchers give health advice in medical research articles, knowledge of which could be an essential step towards curbing potential exaggeration in the current global science communication. It also contributes to ongoing discussions of the integrity of scientific output. This study calls for caution in advice-giving in medical research literature, especially in abstracts alone. It also calls for open access to medical research publications, so that health researchers and practitioners can fully review the advice in scientific outputs and its implications. More evaluative strategies that can increase the overall quality of health advice in research articles are needed by journal editors and reviewers, given their gatekeeping role in science communication

    ONE FUNERAL AT A TIME! Why we hold on to old ideas

    Get PDF
    This article discusses how easily we lock ourselves into established theories, which are often promoted within our formal education. They can blind us to new developments, influencing our professional practice and progress by narrowing thinking. Chomsky’s theory of Universal Grammar is an example, since this paradigm has dominated teaching and learning for over 50 years. It has led to a focus on the form of language, rather than its content and use, restricting learning approaches and contributing to lower standards of UK Education when compared with other countries. The Organisation for Economic Cooperation & Development (2016 & 17) attributes limited value for language as a reason for the UK being near the bottom of the global league.  Language development is presented within physical, mental, emotional and social aspects of communication. Competence across areas opens the mind to empathy, new experiences, continuous learning, humour, teamwork and cultural awareness. These elements together distinguish us from robots and are vital for our futures, as improved interaction of people is required for new job possibilities since machine technology is taking over routine work.

    Detecting Covert Networks in Multilingual Groups: Evidence within a Virtual World

    Get PDF
    This paper introduces an approach for the examination and organization of unstructured text to identify relationships between networks of individuals. This approach uses discourse analysis to identify information providers and recipients and determines the structure of covert organizations irrespective of the language that facilitate conversations between members. Then, this method applies social network analytics to determine the arrangement of a covert organization without any a priori knowledge of the network structure. This approach is tested and validated using communication data collected in a virtual world setting. Our analysis indicates that the proposed framework successfully detected the covert structure of three information networks, and their cliques, within an online gaming community during a simulation of a large-scale event

    Enhanced lexicon based models for extracting question-answer pairs from web forum

    Get PDF
    A Web forum is an online community that brings people in different geographical locations together. Members of the forum exchange ideas and expertise. As a result, a huge amount of contents on different topics are generated on a daily basis. The huge human generated contents of web forum can be mined as questionanswer pairs (Q&A). One of the major challenges in mining Q&A from web forum is to establish a good relationship between the question and the candidate answers. This problem is compounded by the noisy nature of web forum's human generated contents. Unfortunately, the existing methods that are used to mine knowledge from web forums ignore the effect of noise on the mining tools, making the lexical contents less effective. This study proposes lexicon based models that can automatically mine question-answer pairs with higher accuracy scores from web forum. The first phase of the research produces question mining model. It was implemented using features generated from unigram, bigram, forum metadata and simple rules. These features were screened using both chi-square and wrapper techniques. Wrapper generated features were used by Multinomial NaĂŻve Bayes to finally build the model. The second phase produced a normalized lexical model for answer mining. It was implemented using 13 lexical features that cut across four quality dimensions. The performance of the features was enhanced by noise normalization, a process that fixed orthographic, phonetic and acronyms noises. The third phase of the research produced a hybridized model of lexical and non-lexical features. The average performances of the question mining model, normalized lexical model and hybridized model for answer mining were 90.3%, 97.5%, and 99.5% respectively on three data sets used. They outperformed all previous works in the domain. The first major contribution of the study is the development of an improved question mining model that is characterized by higher accuracy, better specificity, less complex and ability to generate good accuracy across different forum genres. The second contribution is the development of normalized lexical based model that has capability to establish good relationship between a question and its corresponding answer. The third contribution is the development of a hybridized model that integrates lexical features that guarantee relevance with non-lexical that guarantee quality to mine web forum answers. The fourth contribution is a novel integration of question and answer mining models to automatically generate question-answer pairs from web forum

    Procceding 2rd International Seminar on Linguistics

    Get PDF

    Automated model-based spreadsheet debugging

    Get PDF
    Spreadsheets are interactive data organization and calculation programs that are developed in spreadsheet environments like Microsoft Excel or LibreOffice Calc. They are probably the most successful example of end-user developed software and are utilized in almost all branches and at all levels of companies. Although spreadsheets often support important decision making processes, they are, like all software, prone to error. In several cases, faults in spreadsheets have caused severe losses of money. Spreadsheet developers are usually not educated in the practices of software development. As they are thus not familiar with quality control methods like systematic testing or debugging, they have to be supported by the spreadsheet environment itself to search for faults in their calculations in order to ensure the correctness and a better overall quality of the developed spreadsheets. This thesis by publication introduces several approaches to locate faults in spreadsheets. The presented approaches are based on the principles of Model-Based Diagnosis (MBD), which is a technique to find the possible reasons why a system does not behave as expected. Several new algorithmic enhancements of the general MBD approach are combined in this thesis to allow spreadsheet users to debug their spreadsheets and to efficiently find the reason of the observed unexpected output values. In order to assure a seamless integration into the environment that is well-known to the spreadsheet developers, the presented approaches are implemented as an extension for Microsoft Excel. The first part of the thesis outlines the different algorithmic approaches that are introduced in this thesis and summarizes the improvements that were achieved over the general MBD approach. In the second part, the appendix, a selection of the author's publications are presented. These publications comprise (a) a survey of the research in the area of spreadsheet quality assurance, (b) a work describing how to adapt the general MBD approach to spreadsheets, (c) two new algorithmic improvements of the general technique to speed up the calculation of the possible reasons of an observed fault, (d) a new concept and algorithm to efficiently determine questions that a user can be asked during debugging in order to reduce the number of possible reasons for the observed unexpected output values, and (e) a new method to find faults in a set of spreadsheets and a new corpus of real-world spreadsheets containing faults that can be used to evaluate the proposed debugging approaches

    An evolving approach to learning in problem solving and program development : the distributed learning model

    Get PDF
    Technological advances are paving the way for improvements in many sectors of society. The US education system needs to undergo a transformation of existing pedagogical methods to maximize utilization of new technologies. Traditional education has primarily been teacher driven, lectured-based in one location. Advances in technology are challenging existing paradigms by developing tools and educational environments that reach diverse learning styles and surpass the boundaries of current teaching methods. Distributed learning is an emerging paradigm today that has promise to contribute significantly to learning and improve overall academic success. This research first explores various systems that provide different modes of learning. The problem domain of this research is the difficulty novice programmers\u27 face when learning to program. This paper proposes how distributed learning can be used in a teaching environment to enrich learning and the impacts for the given problem domain

    "Shouldn't I use a polarquestion?" Proper Question Forms Disentangling Inconsistencies in Dialogue Systems

    Get PDF
    This work reports on the description of a specific class of clarification requests, adopted for the negotiation of pieces of information part of the common ground for argumentation strategies in human-machine interaction. Two studies are carried out to prove the adequateness of a specific form of polar question in a specific pragmatic situation, where a presupposition is contradicted by a new evidence. Whereas the first one proves the appropriateness of the negative form, the second one also demonstrate how the use of such a form, in the aforementioned pragmatic situation, can affect the principle of robustness, in terms of observability and recoverability, important in human–machine interaction applications. Given the results obtained in the two studies, dialogue systems with such capabilities are, therefore, a desirable goal, as they are expected to lead to improved usability and naturalness in conversation. For this reason, I present here a system capable of detecting conflicts and of using argumentation strategies to signal them consistently with previous observations
    • …
    corecore