477 research outputs found

    Data sensitivity detection in chat interactions for privacy protection

    Get PDF
    In recent years, there has been exponential growth in using virtual spaces, including dialogue systems, that handle personal information. The concept of personal privacy in the literature is discussed and controversial, whereas, in the technological field, it directly influences the degree of reliability perceived in the information system (privacy ‘as trust’). This work aims to protect the right to privacy on personal data (GDPR, 2018) and avoid the loss of sensitive content by exploring sensitive information detection (SID) task. It is grounded on the following research questions: (RQ1) What does sensitive data mean? How to define a personal sensitive information domain? (RQ2) How to create a state-of-the-art model for SID?(RQ3) How to evaluate the model? RQ1 theoretically investigates the concepts of privacy and the ontological state-of-the-art representation of personal information. The Data Privacy Vocabulary (DPV) is the taxonomic resource taken as an authoritative reference for the definition of the knowledge domain. Concerning RQ2, we investigate two approaches to classify sensitive data: the first - bottom-up - explores automatic learning methods based on transformer networks, the second - top-down - proposes logical-symbolic methods with the construction of privaframe, a knowledge graph of compositional frames representing personal data categories. Both approaches are tested. For the evaluation - RQ3 – we create SPeDaC, a sentence-level labeled resource. This can be used as a benchmark or training in the SID task, filling the gap of a shared resource in this field. If the approach based on artificial neural networks confirms the validity of the direction adopted in the most recent studies on SID, the logical-symbolic approach emerges as the preferred way for the classification of fine-grained personal data categories, thanks to the semantic-grounded tailor modeling it allows. At the same time, the results highlight the strong potential of hybrid architectures in solving automatic tasks

    Temporality and modality in entailment graph induction

    Get PDF
    The ability to draw inferences is core to semantics and the field of Natural Language Processing. Answering a seemingly simple question like ‘Did Arsenal play Manchester yesterday’ from textual evidence that says ‘Arsenal won against Manchester yesterday’ requires modeling the inference that ‘winning’ entails ‘playing’. One way of modeling this type of lexical semantics is with Entailment Graphs, collections of meaning postulates that can be learned in an unsupervised way from large text corpora. In this work, we explore the role that temporality and linguistic modality can play in inducing Entailment Graphs. We identify inferences that were previously not supported by Entailment Graphs (such as that ‘visiting’ entails an ‘arrival’ before the visit) and inferences that were likely to be learned incorrectly (such as that ‘winning’ entails ‘losing’). Temporality is shown to be useful in alleviating these challenges, in the Entailment Graph representation as well as the learning algorithm. An exploration of linguistic modality in the training data shows, counterintuitively, that there is valuable signal in modalized predications. We develop three datasets for evaluating a system’s capability of modeling these inferences, which were previously underrepresented in entailment rule evaluations. Finally, in support of the work on modality, we release a relation extraction system that is capable of annotating linguistic modality, together with a comprehensive modality lexicon

    Information Technology and Lawyers. Advanced Technology in the Legal Domain, from Challenges to Daily Routine

    Get PDF

    Coevolutionary Dynamism of Man-Environment-Organism

    Get PDF
    In our co-evolutionary concept, we reconsider the human-environment unity framed in the M-E-O (Man-Environment-Organism) model, adapting Latour’s ANT theory, where the subject of human evolution is seen in unity with its (his/her/their) “Umwelt,” creating particular social, memetic, and technospherial environmental extensions and hybrids exposed to mutual selective forces. We analyze this issue in the context of coevolutionary mechanisms influencing genetic and memetic selection. Linguistic samples, the sociocultural aspects of reproduction, or sociocultural answers to the challenge of pandemics, prove the coevolutionary significance of the human ecological approach. The competitive M-E-O complexes are actors and subjects of the selective dynamism of human evolution. The M-E-O model offers a hermeneutic framework to understand the selective evolutionary dynamism of today’s techno-civilizational changes, as an accelerated evolutionary process

    Inference of natural language predicates in the open domain

    Get PDF
    Inference of predicates in natural language is a common task for humans in everyday scenarios, and thus for natural language processing by machines, such as in question answering. The question Did Arsenal beat Man United? can be affirmed by a text Arsenal obliterated Man United on Saturday if an inference is drawn that the text predicate obliterate entails beat in the question. In a world of vast and varied text resources, automatic language inference is necessary for bridging this gap between records and queries. A promising model of such inference between predicates is an Entailment Graph (EG), a structure of meaning postulates such as x obliterates y entails x defeats y. EGs are constructed using unsupervised distributional methods over a large corpus, learning representations of natural language predicates contained within. Entailment is directional, and correctly, EGs fail to confirm the opposite, that x defeats y entails x obliterates y; these distinctions are important for language understanding applications. In an EG, postulates are typically defined for a predicate argument pair (x, y) over a fixed vocabulary of such binary valence predicates, which relate two arguments. However, EG meaning postulates are limited in terms of their predicates in two ways. First, using the conventional approach, entailments may only be learned for predicates of the same valence, typically binary to binary entailment, ignoring entailments between valencies and their applications. For example, the binary relation Arsenal defeats Man United leads to an inference in humans that Arsenal is the winner, a unary relation applying to the subject Arsenal. Yet using conventional means, it is not possible to learn these in EGs. Second, only a limited vocabulary of predicates may be learned in training. This is because of the natural Zipfian frequency distribution of predicates in text corpora, which includes an unbounded long tail of rarely-mentioned predicates like obliterate. This distribution simultaneously makes it impractical to learn entailments for every predicate in a language by reading corpora, and also very likely that many of these unlearned predicates may be involved in real queries. This thesis explores inference in the open domain of natural language predicates beyond a fixed vocabulary of binary predicates. First, Entailment Graph valency is addressed. The distributional learning method is refined to enable learning entailments between predicates of different valencies. This improves recall in question answering by leveraging all available predicates in the reference text to answer questions. Second, the problem of overall predicate sparsity in EGs is explored, in which Language Model encoding is applied unsupervised with an EG. This provides a means of approximating missing premise predicates at test-time, which improves both recall and precision. However, while approximating missing hypothesis predicates is shown to be possible in principle, it remains a challenge. Finally, a behavioral study is presented on Large Language Models (containing one billion parameters or more) which investigates their ability to perform language inference involving fully open-domain premise and hypothesis predicates. While superficially performant, this class of model is found to merely approximate language inference, utilizing unsound methods to mimic reasoning including memorized training data and proxies learned from corpus distributions, which have no direct relationship with meaning

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    An ebd-enabled design knowledge acquisition framework

    Get PDF
    Having enough knowledge and keeping it up to date enables designers to execute the design assignment effectively and gives them a competitive advantage in the design profession. Knowledge elicitation or acquisition is a crucial component of system design, particularly for tasks requiring transdisciplinary or multidisciplinary cooperation. In system design, extracting domain-specific information is exceedingly tricky for designers. This thesis presents three works that attempt to bridge the gap between designers and domain expertise. First, a systematic literature review on data-driven demand elicitation is given using the Environment-based Design (EBD) approach. This review address two research objectives: (i) to investigate the present state of computer-aided requirement knowledge elicitation in the domains of engineering; (ii) to integrate EBD methodology into the conventional literature review framework by providing a well-structured research question generation methodology. The second study describes a data-driven interview transcript analysis strategy that employs EBD environment analysis, unsupervised machine learning, and a range of natural language processing (NLP) approaches to assist designers and qualitative researchers in extracting needs when domain expertise is lacking. The second research proposes a transfer-learning method-based qualitative text analysis framework that aids researchers in extracting valuable knowledge from interview data for healthcare promotion decision-making. The third work is an EBD-enabled design lexical knowledge acquisition framework that automatically constructs a semantic network -- RomNet from an extensive collection of abstracts from engineering publications. Applying RomNet can improve the design information retrieval quality and communication between each party involved in a design project. To conclude, this thesis integrates artificial intelligence techniques, such as Natural Language Processing (NLP) methods, Machine Learning techniques, and rule-based systems to build a knowledge acquisition framework that supports manual, semi-automatic, and automatic extraction of design knowledge from different types of the textual data source

    Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications

    Get PDF
    The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations
    • 

    corecore