7 research outputs found

    Semantic analysis for paraphrase identification using semantic role labeling

    Get PDF
    Reuse of documents has been prominently appeared during the course of digitalization of information contents owing to the wide-spread of internet and smartphones in various complex forms such as inserting words, omitting and substituting, changing word order, and etc. Especially, when a word in document is substituted with a similar word, it would be an issue not to consider it as a subject of measurement for the existing morphological similarity measurement method. In order to resolve this kind of problem, various researches have been conducted on the similarity measurement considering semantic information. This study is to propose a measurement method on semantic similarity being characterized as semantic role information in sentences acquired by semantic role labeling. To assess the performance of this proposed method, it was compared with the method of substring similarity being utilized for similarity measurement for existing documents. As a result, we could identify that the proposed method performed similar with the conventional method for the plagiarized documents which were rarely modified whereas it had improved results for paraphrasing sentences which were changed in structure

    Una aplicación de minería de datos para el análisis de la propiedad de terminación de SRTs

    Full text link
    [ES] La minería de datos juega a día de hoy un papel importante en muchos campos de la ciencia debido a la gran cantidad de información con la que se trabaja diariamente y la que se puede deducir de la misma. Las posibilidades que ofrece la minería de datos son muy diversas, siendo además su ámbito de aplicación muy extenso. Uno de los posibles campos y del que versa este trabajo, es la predicción de propiedades de sistemas software. Para ser mas concretos, en este proyecto hemos planteado la posibilidad de predecir la propiedad de terminación de los sistemas de reescritura de términos. Para ello se ha hecho uso de la base de datos de resultados de la Termination Competition, competición que se celebra anualmente desde 2006 y que trata sobre la demostración de la terminación de sistemas de reescritura. En esta base de datos se almacenan en forma de registros los resultados obtenidos por herramientas de terminación que participan en la competición, registros tales como el resultado de la demostración, tiempo empleado para obtener la respuesta y la traza de la demostración.[EN] Nowadays data mining plays an important role in many science fields due to the huge volume of information that we handle on a daily basis as well as the information which can be obtained from it. Possibilities offered by data mining are very varied and its scope of application is very large. One of the possible fields we handle with in this project is the prediction of software system features. Specifically in this project we have researched the possibility of foreseeing the termination feature of term rewriting systems. To achieve this goal the database containing the results of the Termination Competition taking place annually since 2006 has been used. Such database stores the results obtained by termination tools taking part in the competition, such as the result of the demonstration, time employed to obtain the result and the trace of such a demonstration.Fabregat Marcos, H. (2015). Una aplicación de minería de datos para el análisis de la propiedad de terminación de SRTs. http://hdl.handle.net/10251/54056.TFG

    Multi-Objective Learning for Multi-Modal Natural Language Generation

    Get PDF
    One of the important goals of Artificial Intelligence (AI) is to mimic the ability of humans to leverage the knowledge or skill from previously learned tasks to quickly learn a new task. For example, humans can reapply the learned skill of balancing the bicycle for learning to ride a motorbike. In a similar context, the field of Natural Language Processing (NLP) has several tasks including machine translation, textual summarization, image/video captioning, sentiment analysis, dialog systems, natural language inference, question answering, etc. While these different NLP tasks are often trained separately, leveraging the knowledge or skill from related tasks via joint training or training one task after another task in a sequential fashion, can have potential advantages. To this end, this dissertation explores various NLP tasks (especially multi-modal text generation and pair-wise classification tasks covering both natural language generation (NLG) and natural language understanding (NLU)) leveraging information from the related auxiliary tasks in an effective way via novel multi-objective learning strategies. These proposed novel learning strategies can be broadly classified into three paradigms: multi-task learning, multi-reward reinforcement learning, and continual learning. In multi-task learning, we mainly focus on intuitively finding what related auxiliary tasks can benefit the multi-modal video caption generation task and textual summarization task. We explore effective ways of sharing the parameters across these related tasks via joint training. In multi-reward reinforcement learning, we teach various skills to multi-modal text generation models in the form of rewards. For example, we try to teach the entailment skill to the video captioning model with entailment rewards. Further, we propose novel and effective ways of inducing multiple skills by `dynamically' choosing the auxiliary tasks (in MTL) or rewards (in RL) during the training in an automatic way using multi-armed bandits based approaches. Finally, in continual learning, we explore sharing of information across various tasks in a sequential way, where the model continually evolves during the sequential training without losing the performance on previously learned tasks. This kind of sharing allows the later tasks to benefit from previously trained tasks and vice-versa in some cases. For this, we propose a novel method that continually changes the model architecture to accommodate new tasks while retaining performance on old tasks. We empirically evaluate our method on three natural language inference tasks.Doctor of Philosoph

    Artificial Intelligence-enabled Automation for Compliance Checking against GDPR

    Get PDF
    Requirements engineering (RE) is concerned with eliciting legal requirements from applicable regulations to enable developing legally compliant software. Current software systems rely heavily on data, some of which can be confidential, personal, or sensitive. To address the growing concerns about data protection and privacy, the general data protection regulation (GDPR) has been introduced in the European Union (EU). Organizations, whether based in the EU or not, must comply with GDPR as long as they collect or process personal data of EU residents. Breaching GDPR can be charged with large fines reaching up to up to billions of euros. Privacy policies (PPs) and data processing agreements (DPAs) are documents regulated by GDPR to ensure, among other things, secure collection and processing of personal data. Such regulated documents can be used to elicit legal requirements that are inline with the organizations’ data protection policies. As a prerequisite to elicit a complete set of legal requirements, however, these documents must be compliant with GDPR. Checking the compliance of regulated documents entirely manually is a laborious and error-prone task. As we elaborate below, this dissertation investigates utilizing artificial intelligence (AI) technologies to provide automated support for compliance checking against GDPR. • AI-enabled Automation for Compliance Checking of PPs: PPs are technical documents stating the multiple privacy-related requirements that a system should satisfy in order to help individuals make informed decisions about sharing their personal data. We devise an automated solution that leverages natural language processing (NLP) and machine learning (ML), two sub-fields of AI, for checking the compliance of PPs against the applicable provisions in GDPR. Specifically, we create a comprehensive conceptual model capturing all information types pertinent to PPs and we further define a set of compliance criteria for the automated compliance checking of PPs. • NLP-based Automation for Compliance Checking of DPAs: DPAs are legally binding agreements between different organizations involved in the collection and processing of personal data to ensure that personal data remains protected. Using NLP semantic analysis technologies, we develop an automated solution that checks at phrasal-level the compliance of DPAs against GDPR. Our solution is able to provide not only a compliance assessment, but also detailed recommendations about avoiding GDPR violations. • ML-enabled Automation for Compliance Checking of DPAs: To understand how different representations of GDPR requirements and different enabling technologies fare against one another, we develop an automated solution that utilizes a combination of conceptual modeling and ML. We further empirically compare the resulting solution with our previously proposed solution, which uses natural language to represent GDPR requirements and leverages rules alongside NLP semantic analysis for the automated support
    corecore