13 research outputs found

    Feedback in Conversation as Incremental Semantic Update

    Get PDF
    Eshghi is supported by the EPSRC BABBLE project (grant number EP/M01553X/1) and Hough by the DUEL project funded by the ANR (grant number ANR-13-FRAL-0001) and the DFG (grant number SCHL 845/5-1). We thank them for their financial support. Purver is partially supported by ConCreTe: the project ConCreTe acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET grant number 61173

    Challenging Neural Dialogue Models with Natural Data: Memory Networks Fail on Incremental Phenomena

    Full text link
    Natural, spontaneous dialogue proceeds incrementally on a word-by-word basis; and it contains many sorts of disfluency such as mid-utterance/sentence hesitations, interruptions, and self-corrections. But training data for machine learning approaches to dialogue processing is often either cleaned-up or wholly synthetic in order to avoid such phenomena. The question then arises of how well systems trained on such clean data generalise to real spontaneous dialogue, or indeed whether they are trainable at all on naturally occurring dialogue data. To answer this question, we created a new corpus called bAbI+ by systematically adding natural spontaneous incremental dialogue phenomena such as restarts and self-corrections to the Facebook AI Research's bAbI dialogues dataset. We then explore the performance of a state-of-the-art retrieval model, MemN2N, on this more natural dataset. Results show that the semantic accuracy of the MemN2N model drops drastically; and that although it is in principle able to learn to process the constructions in bAbI+, it needs an impractical amount of training data to do so. Finally, we go on to show that an incremental, semantic parser -- DyLan -- shows 100% semantic accuracy on both bAbI and bAbI+, highlighting the generalisation properties of linguistically informed dialogue models.Comment: 9 pages, 3 figures, 2 tables. Accepted as a full paper for SemDial 201

    Completability vs (In)completeness

    Get PDF
    In everyday conversation, no notion of “complete sentence” is required for syntactic licensing. However, so-called “fragmentary”, “incomplete”, and abandoned utterances are problematic for standard formalisms. When contextualised, such data show that (a) non-sentential utterances are adequate to underpin agent coordination, while (b) all linguistic dependencies can be systematically distributed across participants and turns. Standard models have problems accounting for such data because their notions of ‘constituency’ and ‘syntactic domain’ are independent of performance considerations. Concomitantly, we argue that no notion of “full proposition” or encoded speech act is necessary for successful interaction: strings, contents, and joint actions emerge in conversation without any single participant having envisaged in advance the outcome of their own or their interlocutors’ actions. Nonetheless, morphosyntactic and semantic licensing mechanisms need to apply incrementally and subsententially. We argue that, while a representational level of abstract syntax, divorced from conceptual structure and physical action, impedes natural accounts of subsentential coordination phenomena, a view of grammar as a “skill” employing domain-general mechanisms, rather than fixed form-meaning mappings, is needed instead. We provide a sketch of a predictive and incremental architecture (Dynamic Syntax) within which underspecification and time-relative update of meanings and utterances constitute the sole concept of “syntax”

    Incremental Composition in Distributional Semantics

    Get PDF
    Despite the incremental nature of Dynamic Syntax (DS), the semantic grounding of it remains that of predicate logic, itself grounded in set theory, so is poorly suited to expressing the rampantly context-relative nature of word meaning, and related phenomena such as incremental judgements of similarity needed for the modelling of disambiguation. Here, we show how DS can be assigned a compositional distributional semantics which enables such judgements and makes it possible to incrementally disambiguate language constructs using vector space semantics. Building on a proposal in our previous work, we implement and evaluate our model on real data, showing that it outperforms a commonly used additive baseline. In conclusion, we argue that these results set the ground for an account of the non-determinism of lexical content, in which the nature of word meaning is its dependence on surrounding context for its construal

    Discovering the Arrow of Time in Machine Learning

    Get PDF
    From MDPI via Jisc Publications RouterHistory: accepted 2021-10-13, pub-electronic 2021-10-22Publication status: PublishedFunder: Economic and Social Research Council; Grant(s): ES/P008437/1Machine learning (ML) is increasingly useful as data grow in volume and accessibility. ML can perform tasks (e.g., categorisation, decision making, anomaly detection, etc.) through experience and without explicit instruction, even when the data are too vast, complex, highly variable, full of errors to be analysed in other ways. Thus, ML is great for natural language, images, or other complex and messy data available in large and growing volumes. Selecting ML models for tasks depends on many factors as they vary in supervision needed, tolerable error levels, and ability to account for order or temporal context, among many other things. Importantly, ML methods for tasks that use explicitly ordered or time-dependent data struggle with errors or data asymmetry. Most data are (implicitly) ordered or time-dependent, potentially allowing a hidden `arrow of time’ to affect ML performance on non-temporal tasks. This research explores the interaction of ML and implicit order using two ML models to automatically classify (a non-temporal task) tweets (temporal data) under conditions that balance volume and complexity of data. Results show that performance was affected, suggesting that researchers should carefully consider time when matching appropriate ML models to tasks, even when time is only implicitly included

    An enhanced sequential exception technique for semantic-based text anomaly detection

    Get PDF
    The detection of semantic-based text anomaly is an interesting research area which has gained considerable attention from the data mining community. Text anomaly detection identifies deviating information from general information contained in documents. Text data are characterized by having problems related to ambiguity, high dimensionality, sparsity and text representation. If these challenges are not properly resolved, identifying semantic-based text anomaly will be less accurate. This study proposes an Enhanced Sequential Exception Technique (ESET) to detect semantic-based text anomaly by achieving five objectives: (1) to modify Sequential Exception Technique (SET) in processing unstructured text; (2) to optimize Cosine Similarity for identifying similar and dissimilar text data; (3) to hybridize modified SET with Latent Semantic Analysis (LSA); (4) to integrate Lesk and Selectional Preference algorithms for disambiguating senses and identifying text canonical form; and (5) to represent semantic-based text anomaly using First Order Logic (FOL) and Concept Network Graph (CNG). ESET performs text anomaly detection by employing optimized Cosine Similarity, hybridizing LSA with modified SET, and integrating it with Word Sense Disambiguation algorithms specifically Lesk and Selectional Preference. Then, FOL and CNG are proposed to represent the detected semantic-based text anomaly. To demonstrate the feasibility of the technique, four selected datasets namely NIPS data, ENRON, Daily Koss blog, and 20Newsgroups were experimented on. The experimental evaluation revealed that ESET has significantly improved the accuracy of detecting semantic-based text anomaly from documents. When compared with existing measures, the experimental results outperformed benchmarked methods with an improved F1-score from all datasets respectively; NIPS data 0.75, ENRON 0.82, Daily Koss blog 0.93 and 20Newsgroups 0.97. The results generated from ESET has proven to be significant and supported a growing notion of semantic-based text anomaly which is increasingly evident in existing literatures. Practically, this study contributes to topic modelling and concept coherence for the purpose of visualizing information, knowledge sharing and optimized decision making

    Self-initiated self-repairs found in the conversation of video podcast K-Pop Daebak Show program

    Get PDF
    ENGLISH: Every people want to talk smoothly in real situations, yet they still create a variety of speech errors when they fail to deliver their speech properly due to nervousness, panic, or tiredness, as well as forgetting what they were going to say while speaking. As a result of the occurrence of a speech error in the utterance, the speaker initiated self-repair by correcting, clarifying, or just searching for the correct word to repair her/his own speech error. Therefore, the idea of self-initiated self-repair develops, as evidenced by the dialogue between the presenter and the guest star of K-Pop Daebak Show episode 72. Methodologically, descriptive qualitative is used since the primary goal of this study is to provide in-depth data analysis. In general, this study examines self-initiated self-repair with the aim of identifying the different types of self-initiated self-repair in a podcast conversation between an American accent and a British accent brought by two American-Koreans by using Levelt‘s theory reviewed by (Carroll, 2008). Then, it is expected to recognize how self-initiated self-repair strategies are used in conversation by using a theory of ten-operations in self-initiated self-repair proposed by (Schegloff, 2012). These research findings reveal that the speaker employed three types of self-initiated self-repair: instant repair, anticipatory retracing, and a fresh start. When repairing word errors, this type of anticipatory retracing is used most frequently. Moreover, this study discovered seven out of 10 self-initiated self-repair strategies: Replacing, inserting, deleting, aborting, searching, sequence hopping, and reordering. Replacing is the most commonly employed strategy in conversation. Furthermore, in this study positive feedback was consistently detected after the speaker's turn had ended, indicating that the presence of a feedback aspect aids this research in finding in-depth self-initiated self-repair strategies that occur during talks. INDONESIA: Setiap orang ingin berbicara dengan lancar dalam situasi nyata, namun mereka masih membuat berbagai kesalahan bicara ketika mereka gagal menyampaikan pidatonya dengan baik karena gugup, panik, atau lelah, serta lupa apa yang akan mereka katakan saat berbicara. Akibat terjadinya kesalahan tutur dalam tuturan tersebut, penutur memulai perbaikan diri dengan mengoreksi, mengklarifikasi, atau sekedar mencari kata yang tepat untuk memperbaiki kesalahan tutur itu sendiri. Oleh karena itu, ide self-initiated self-repair berkembang, seperti yang ditunjukkan dalam dialog antara presenter dan bintang tamu K-Pop Daebak Show episode 72. Secara metodologis, deskriptif kualitatif digunakan karena tujuan utama dari penelitian ini adalah untuk memberikan analisis data yang mendalam. Secara umum penelitian ini mengkaji self-initiated self-repair dengan tujuan untuk mengidentifikasi perbedaan jenis self-initiated self-repair dalam percakapan podcast antara aksen Amerika dan aksen Inggris yang dibawakan oleh dua orang Amerika-Korea dengan menggunakan teori Levelt yang diulas oleh (Carroll, 2008). Kemudian, diharapkan untuk mengenali bagaimana strategi perbaikan diri yang diprakarsai sendiri digunakan dalam percakapan dengan menggunakan teori sepuluh operasi dalam perbaikan diri yang dimulai sendiri yang dikemukakan oleh (Schegloff, 2012). Temuan penelitian ini mengungkapkan bahwa pembicara menggunakan tiga jenis perbaikan diri yang diprakarsai sendiri: perbaikan instan, penelusuran ulang antisipatif, dan awal yang baru. Saat memperbaiki kesalahan kata, jenis penelusuran ulang antisipatif ini paling sering digunakan. Selain itu, penelitian ini menemukan tujuh dari 10 strategi perbaikan diri yang dimulai sendiri: Mengganti, memasukkan, menghapus, membatalkan, mencari, melompat urutan, dan menyusun ulang. Mengganti adalah strategi yang paling umum digunakan dalam percakapan. Selanjutnya, dalam penelitian ini umpan balik positif secara konsisten terdeteksi setelah giliran pembicara berakhir, menunjukkan bahwa adanya aspek umpan balik membantu penelitian ini dalam menemukan strategi perbaikan diri inisiatif diri yang mendalam yang terjadi selama pembicaraan. ARABIC: يريد الجميع التحدث بطلاقة في مواقف حقيقية ، لكنهم ما زالوا يرتكبون أخطاء الكلام المختلفة عندما يفشلون في إلقاء خطابهم بشكل صحيح لأنهم متوترون أو مذعورون أو متعبون وينسون ما سيقولونه أثناء التحدث. نتيجة لحدوث أخطاء الكلام في الكلام ، يبدأ المتحدث في تحسين الذات من خلال التصحيح أو التوضيح أو مجرد البحث عن الكلمات الصحيحة لتصحيح خطأ الكلام نفسه. لذلك ، تطورت فكرة الإصلاح الذاتي الذاتي ، كما هو موضح في الحوار بين المقدم والنجم الضيف في Show Daebak Pop-K ، الحلقة الثانية والسبعون. من الناحية المنهجية ، يتم استخدام النوع الوصفي لأن الغرض الرئيسي من هذا البحث هو توفير تحليل متعمق للبيانات. بشكل عام ، تفحص هذه الدراسة الإصلاح الذاتي الذاتي بهدف تحديد الأنواع المختلفة للإصلاح الذاتي الذاتي في محادثات البودكاست بين اللهجات الأمريكية واللهجات البريطانية التي قدمها اثنان من الأمريكيين الكوريين باستخدام نظرية ليفيلت كما تمت مراجعتها بواسطة (كارول ، 2008). بعد ذلك ، من المأمول التعرف على كيفية استخدام استراتيجيات التحسين الذاتي الذاتية في (Schegloff ، 2012) المحادثة باستخدام نظرية عشر عمليات في التحسين الذاتي الذاتي التي اقترحها تكشف نتائج هذا البحث أن المتحدث استخدم ثلاثة أنواع من الإصلاح الذاتي الذاتي: الإصلاح الفوري ، والارتداد التوقعي ، والبداية الجديدة. عند إصلاح أخطاء الكلمات ، يتم استخدام هذا النوع من التصحيح الاستباقي بشكل متكرر. علاوة على ذلك ، اكتشفت هذه الدراسة سبعة من أصل 10 استراتيجيات ذاتية للإصلاح الذاتي: الاستبدال ، والإدراج ، والحذف ، والإجهاض ، والبحث ، والتنقل المتسلسل ، وإعادة الترتيب. الاستبدال هو الاستراتيجية الأكثر استخدامًا في المحادثة. علاوة على ذلك ، في هذه الدراسة ، تم اكتشاف ردود الفعل الإيجابية باستمرار بعد انتهاء دور المتحدث. وبالتالي ، في هذه الحالة ، يُظهر وجود عنصر التغذية الراجعة أنه يمكن أن يدعم هذه الدراسة في تحديد استراتيجيات الإصلاح الذاتي المتعمقة التي تبدأ أثناء المحادثات
    corecore