7,051 research outputs found

    Survey on Evaluation Methods for Dialogue Systems

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    Understanding and exploiting user intent in community question answering

    Get PDF
    A number of Community Question Answering (CQA) services have emerged and proliferated in the last decade. Typical examples include Yahoo! Answers, WikiAnswers, and also domain-specific forums like StackOverflow. These services help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently. In this thesis, we analyse the intent of each question in CQA by classifying it into five dimensions, namely: subjectivity, locality, navigationality, procedurality, and causality. By making use of advanced machine learning techniques, such as Co-Training and PU-Learning, we are able to attain consistent and significant classification improvements over the state-of-the-art in this area. In addition to the textual features, a variety of metadata features (such as the category where the question was posted to) are used to model a user's intent, which in turn help the CQA service to perform better in finding similar questions, identifying relevant answers, and recommending the most relevant answerers. We validate the usefulness of user intent in two different CQA tasks. Our first application is question retrieval, where we present a hybrid approach which blends several language modelling techniques, namely, the classic (query-likelihood) language model, the state-of-the-art translation-based language model, and our proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using our proposed hybrid approach, and then validates whether the answer of the top candidate can be served as an answer to a new question by leveraging sentiment analysis, query quality assessment, and search lists validation

    Questionnaire integration system based on question classification and short text semantic textual similarity, A

    Get PDF
    2018 Fall.Includes bibliographical references.Semantic integration from heterogeneous sources involves a series of NLP tasks. Existing re- search has focused mainly on measuring two paired sentences. However, to find possible identical texts between two datasets, the sentences are not paired. To avoid pair-wise comparison, this thesis proposed a semantic similarity measuring system equipped with a precategorization module. It applies a hybrid question classification module, which subdivides all texts to coarse categories. The sentences are then paired from these subcategories. The core task is to detect identical texts between two sentences, which relates to the semantic textual similarity task in the NLP field. We built a short text semantic textual similarity measuring module. It combined conventional NLP techniques, including both semantic and syntactic features, with a Recurrent Convolutional Neural Network to accomplish an ensemble model. We also conducted a set of empirical evaluations. The results show that our system possesses a degree of generalization ability, and it performs well on heterogeneous sources

    Biomedical semantic question and answering system

    Get PDF
    Tese de mestrado, Informática, Universidade de Lisboa, Faculdade de Ciências, 2017Os sistemas de Question Answering são excelentes ferramentas para a obtenção de respostas simples e em vários formatos de uma maneira tamb´em simples, sendo de grande utilidade na área de Information Retrieval, para responder a perguntas da comunidade online, e também para fins investigativos ou de prospeção de informação. A área da saúde tem beneficiado muito com estes avanços, auxiliados com o progresso da tecnologia e de ferramentas delas provenientes, que podem ser usadas nesta área, resultando na constante informatização destas áreas. Estes sistemas têm um grande potencial, uma vez que eles acedem a grandes conjuntos de dados estruturados e não estruturados, como por exemplo, a Web ou a grandes repositórios de informação provenientes de lá, de forma a obter as suas respostas, e no caso da comunidade de perguntas e respostas, fóruns online de perguntas e respostas em threads por temática. Os dados não estruturados fornecem um maior desafio, apesar dos dados estruturados de certa maneira limitar o leque de opções transformativas sobre os mesmos. A mesma disponibilização de tais conjuntos de dados de forma pública em formato digital oferecem uma maior liberdade para o público, e mais especificamente os investigadores das áreas específicas envolvidas com estes dados, permitindo uma fácil partilha das mesmas entre os vários interessados. De um modo geral, tais sistemas não estão disponíveis para reutilização pública, porque estão limitados ao campo da investigação, para provar conceitos de algoritmos específicos, são de difícil reutilização por parte de um público mais alargado, ou são ainda de difícil manutenção, pois rapidamente podem ficar desatualizados, principalmente nas tecnologias usadas, que podem deixar de ter suporte. O objetivo desta tese é desenvolver um sistema que colmate algumas destas falhas, promovendo a modularidade entre os módulos, o equilíbrio entre a implementação e a facilidade de utilização, desempenho dos sub-módulos, com o mínimo de pré-requisitos possíveis, tendo como resultado final um sistema de QA base adapaptado para um domínio de conhecimento. Tal sistema será constituído por subsistemas provados individualmente. Nesta tese, são descritobos vários tipos de sistemas, como os de prospecção de informação e os baseados em conhecimento, com enfoque em dois sistemas específicos desta área, o YodaQA e o OAQA. São apresentadas também várias ferramentas úteis e que são recorridas em vários destes sistemas que recorrem a técnicas de Text Classification, que vão desde o processamento de linguagem natural, ao Tokenizatioin, ao Part-of-speech tagging, como a exploração de técnicas de aprendizagem automática (Machine Learning) recorrendo a algoritmos supervisionados e não supervisionados, a semelhança textual (Pattern Matching) e semelhança semântica (Semantic Similarity). De uma forma geral, a partir destas técnicas é possível através de trechos de texto fornecidos, obter informação adicional acerca desses mesmos trechos. São ainda abordadas várias ferramentas que utilizam as técnicas descritas, como algumas de anotação, outras de semelhança semântica e ainda outras num contexto de organização, ordenação e pesquisa de grandes quantidades de informação de forma escaláveis que são úteis e utilizadas neste tipo de aplicações. Alguns dos principais conjuntos de dados são também descritos e abordados. A framework desenvolvida resultou em dois sistemas com uma arquitetura modular em pipeline, composta por módulos distintos consoante a tarefa desenvolvida. Estes módulos tinham bem definido os seus parâmetros de entrada como o que devolviam. O primeiro sistema tinha como entrada um conjunto de threads de perguntas e respostas em comentário e devolvia cada conjunto de dez comentários a uma pergunta ordenada e com um valor que condizia com a utilidade desse comentário para com a resposta. Este sistema denominou-se por MoRS e foi a prova de conceito modular do sistema final a desenvolver. O segundo sistema tem como entrada variadas perguntas da área da biomédica restrita a quatro tipos de pergunta, devolvendo as respectivas respostas, acompanhadas de metadata utilizada na análise dessa pergunta. Foram feitas algumas variações deste sistema, por forma a poder aferir se as escolhas de desenvolvimento iam sendo correctas, utilizando sempre a mesma framework (MoQA) e culminando com o sistema denominado MoQABio. Os principais módulos que compõem estes sistemas incluem, por ordem de uso, um módulo para o reconhecimento de entidades (também biomédicas), utilizando uma das ferramentas já investigadas no capítulo do trabalho relacionado. Também um módulo denominado de Combiner, em que a cada documento recolhido a partir do resultado do módulo anterior, são atribuídos os resultados de várias métricas, que servirão para treinar, no módulo seguinte, a partir da aplicação de algoritmos de aprendizagem automática de forma a gerar um modelo de reconhecimento baseado nestes casos. Após o treino deste modelo, será possível utilizar um classificador de bons e maus artigos. Os modelos foram gerados na sua maioria a partir de Support Vector Machine, havendo também a opção de utilização de Multi-layer Perceptron. Desta feita, dos artigos aprovados são retirados metadata, por forma a construir todo o resto da resposta, que incluia os conceitos, referencia dos documentos, e principais frases desses documentos. No módulo do sistema final do Combiner, existem avaliações que vão desde o já referido Pattern Matching, com medidas como o número de entidades em comum entre a questão e o artigo, de Semantic Similarity usando métricas providenciadas pelos autores da biblioteca Sematch, incluindo semelhança entre conceitos e entidades do DBpedia e outras medidas de semelhança semântica padrão, como Resnik ou Wu-Palmer. Outras métricas incluem o comprimento do artigo, uma métrica de semelhança entre duas frases e o tempo em milisegundos desse artigo. Apesar de terem sido desenvolvidos dois sistemas, as variações desenvolvidas a partir do MoQA, é que têm como pré-requisitos conjuntos de dados provenientes de várias fontes, entre elas o ficheiro de treino e teste de perguntas, o repositório PubMed, que tem inúmeros artigos científicos na área da biomédica, dos quais se vai retirar toda a informação utilizada para as respostas. Além destas fontes locais, existe o OPENphacts, que é externa, que fornecerá informação sobre várias expressões da área biomédica detectadas no primeiro módulo. No fim dos sistemas cujo ancestral foi o MoQA estarem prontos, é possível os utilizadores interagirem com este sistema através de uma aplicação web, a partir da qual, ao inserirem o tipo de resposta que pretendem e a pergunta que querem ver respondida, essa pergunta é passada pelo sistema e devolvida à aplicação web a resposta, e respectiva metadata. Ao investigar a metadata, é possível aceder à informação original. O WS4A participou no BioASQ de 2016, desenvolvida pela equipa ULisboa, o MoRS participou do SemEval Task 3 de 2017 e foi desenvolvida pelo pr´oprio, e por fim oMoQA da mesma autoria do segundo e cujo desempenho foi avaliado consoante os mesmos dados e métricas do WS4A. Enquanto que no caso do BioASQ, era abordado o desempenho de um sistema de Question Answering na àrea da biomédica, no SemEval era abordado um sistema de ordenação de comentários para com uma determinada pergunta, sendo os sistemas submetidos avaliados oficialmente usando as medidas como precision, recall e F-measure. De forma a comparar o impacto das características e ferramentas usadas em cada um dos modelos de aprendizagem automática construídos, estes foram comparados entre si, assim como a melhoria percentual entre os sistemas desenvolvidos ao longo do tempo. Além das avaliações oficiais, houve também avaliações locais que permitiram explorar ainda mais a progressão dos sistemas ao longo do tempo, incluindo os três sistemas desenvolvidos a partir do MoQA. Este trabalho apresenta um sistema que apesar de usar técnicas state of the art com algumas adaptações, conseguiu atingir uma melhoria desempenho relevante face ao seu predecessor e resultados equiparados aos melhores do ano da competição cujos dados utilizou, possuindo assim um grande potencial para atingir melhores resultados. Alguns dos seus contributos já vêm desde Fevereiro de 2016, com o WS4A [86], que participou no BioASQ 2016, com o passo seguinte no MoRS [85], que por sua vez participou no SemEval 2017, findando pelo MoQA, com grandes melhorias e disponível ao público em https://github.com/lasigeBioTM/MoQA. Como trabalho futuro, propõem-se sugestões, começando por melhorar a robustez do sistema, exploração adicional da metadata para melhor direcionar a pesquisa de respostas, a adição e exploração de novas características do modelo a desenvolver e a constante renovação de ferramentas utilizadas Também a incorporação de novas métricas fornecidas pelo Sematch, o melhoramento da formulação de queries feitas ao sistema são medidas a ter em atenção, dado que é preciso pesar o desempenho e o tempo de resposta a uma pergunta.Question Answering systems have been of great use and interest in our times. They are great tools for acquiring simple answers in a simple way, being of great utility in the area of information retrieval, and also for community question answering. Such systems have great potential, since they access large sets of data, for example from the Web, to acquire their answers, and in the case of community question answering, forums. Such systems are not available for public reuse because they are only limited for researching purposes or even proof-of-concept systems of specific algorithms, with researchers repeating over and over again the same r very similar modules frequently, thus not providing a larger public with a tool which could serve their purposes. When such systems are made available, are of cumbersome installation or configuration, which includes reading the documentation and depending on the researchers’ programming ability. In this thesis, the two best available systems in these situations, YodaQA and OAQA are described. A description of the main modules is given, with some sub-problems and hypothetical solutions, also described. Many systems, algorithms (i.e. learning, ranking) were also described. This work presents a modular system, MoQA (which is available at https:// github.com/lasigeBioTM/MoQA), that solves some of these problems by creating a framework that comes with a baseline QA system for general purpose local inquiry, but which is a highly modular system, built with individually proven subsystems, and using known tools such as Sematch, It is a descendant of WS4A [86] and MoRS [85], which took part in BioASQ 2016 (with recognition) and SemEval 2017 repectively. Machine Learning algorithms and Stanford Named Entity Recognition. Its purpose is to have a performance as high as possible while keeping the prerequisites, edition, and the ability to change such modules to the users’ wishes and researching purposes while providing an easy platform through which the final user may use such framework. MoQA had three variants, which were compared with each other, with MoQABio, with the best results among them, by using different tools than the other systems, focusing on the biomedical domain knowledge

    Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering

    Full text link
    Recently, the development of large language models (LLMs) has attracted wide attention in academia and industry. Deploying LLMs to real scenarios is one of the key directions in the current Internet industry. In this paper, we present a novel pipeline to apply LLMs for domain-specific question answering (QA) that incorporates domain knowledge graphs (KGs), addressing an important direction of LLM application. As a real-world application, the content generated by LLMs should be user-friendly to serve the customers. Additionally, the model needs to utilize domain knowledge properly to generate reliable answers. These two issues are the two major difficulties in the LLM application as vanilla fine-tuning can not adequately address them. We think both requirements can be unified as the model preference problem that needs to align with humans to achieve practical application. Thus, we introduce Knowledgeable Preference AlignmenT (KnowPAT), which constructs two kinds of preference set called style preference set and knowledge preference set respectively to tackle the two issues. Besides, we design a new alignment objective to align the LLM preference with human preference, aiming to train a better LLM for real-scenario domain-specific QA to generate reliable and user-friendly answers. Adequate experiments and comprehensive with 15 baseline methods demonstrate that our KnowPAT is an outperforming pipeline for real-scenario domain-specific QA with LLMs. Our code is open-source at https://github.com/zjukg/KnowPAT.Comment: Work in progress. Code is available at https://github.com/zjukg/KnowPA

    Identification of Online Users' Social Status via Mining User-Generated Data

    Get PDF
    With the burst of available online user-generated data, identifying online users’ social status via mining user-generated data can play a significant role in many commercial applications, research and policy-making in many domains. Social status refers to the position of a person in relation to others within a society, which is an abstract concept. The actual definition of social status is specific in terms of specific measure indicator. For example, opinion leadership measures individual social status in terms of influence and expertise in an online society, while socioeconomic status characterizes personal real-life social status based on social and economic factors. Compared with traditional survey method which is time-consuming, expensive and sometimes difficult, some efforts have been made to identify specific social status of users based on specific user-generated data using classic machine learning methods. However, in fact, regarding specific social status identification based on specific user-generated data, the specific case has several specific challenges. However, classic machine learning methods in existing works fail to address these challenges, which lead to low identification accuracy. Given the importance of improving identification accuracy, this thesis studies three specific cases on identification of online and offline social status. For each work, this thesis proposes novel effective identification method to address the specific challenges for improving accuracy. The first work aims at identifying users’ online social status in terms of topic-sensitive influence and knowledge authority in social community question answering sites, namely identifying topical opinion leaders who are both influential and expert. Social community question answering (SCQA) site, an innovative community question answering platform, not only offers traditional question answering (QA) services but also integrates an online social network where users can follow each other. Identifying topical opinion leaders in SCQA has become an important research area due to the significant role of topical opinion leaders. However, most previous related work either focus on using knowledge expertise to find experts for improving the quality of answers, or aim at measuring user influence to identify influential ones. In order to identify the true topical opinion leaders, we propose a topical opinion leader identification framework called QALeaderRank which takes account of both topic-sensitive influence and topical knowledge expertise. In the proposed framework, to measure the topic-sensitive influence of each user, we design a novel influence measure algorithm that exploits both the social and QA features of SCQA, taking into account social network structure, topical similarity and knowledge authority. In addition, we propose three topic-relevant metrics to infer the topical expertise of each user. The extensive experiments along with an online user study show that the proposed QALeaderRank achieves significant improvement compared with the state-of-the-art methods. Furthermore, we analyze the topic interest change behaviors of users over time and examine the predictability of user topic interest through experiments. The second work focuses on predicting individual socioeconomic status from mobile phone data. Socioeconomic Status (SES) is an important social and economic aspect widely concerned. Assessing individual SES can assist related organizations in making a variety of policy decisions. Traditional approach suffers from the extremely high cost in collecting large-scale SES-related survey data. With the ubiquity of smart phones, mobile phone data has become a novel data source for predicting individual SES with low cost. However, the task of predicting individual SES on mobile phone data also proposes some new challenges, including sparse individual records, scarce explicit relationships and limited labeled samples, unconcerned in prior work restricted to regional or household-oriented SES prediction. To address these issues, we propose a semi-supervised Hypergraph based Factor Graph Model (HyperFGM) for individual SES prediction. HyperFGM is able to efficiently capture the associations between SES and individual mobile phone records to handle the individual record sparsity. For the scarce explicit relationships, HyperFGM models implicit high-order relationships among users on the hypergraph structure. Besides, HyperFGM explores the limited labeled data and unlabeled data in a semi-supervised way. Experimental results show that HyperFGM greatly outperforms the baseline methods on individual SES prediction with using a set of anonymized real mobile phone data. The third work is to predict social media users’ socioeconomic status based on their social media content, which is useful for related organizations and companies in a range of applications, such as economic and social policy-making. Previous work leverage manually defined textual features and platform-based user level attributes from social media content and feed them into a machine learning based classifier for SES prediction. However, they ignore some important information of social media content, containing the order and the hierarchical structure of social media text as well as the relationships among user level attributes. To this end, we propose a novel coupled social media content representation model for individual SES prediction, which not only utilizes a hierarchical neural network to incorporate the order and the hierarchical structure of social media text but also employs a coupled attribute representation method to take into account intra-coupled and inter-coupled interaction relationships among user level attributes. The experimental results show that the proposed model significantly outperforms other stat-of-the-art models on a real dataset, which validate the efficiency and robustness of the proposed model

    Quality-aware architectural model transformations in adaptive mashups user interfaces

    Get PDF
    The final publication is available at IOS Press through http://dx.doi.org/10.3233/FI-2016-0000Mashup user interfaces provides their functionality through the combination of different services. The integration of such services can be solved by using reusable and third-party components. Furthermore, these interfaces must be adapted to user preferences, context changes, user interactions and component availability. Model transformation is a useful mechanism to address this adaptation but normally these operations only focus on the functional requirements. In this sense, quality attributes should be included in the adaptation process to obtain the best adapted mashup user interface. This paper proposes a generic quality-aware transformation process to support the adaptation of software architectures. The transformation process has been applied in ENIA, a geographic information system, by constructing a specific quality model for the adaptation of mashup user interfaces. This model is taken into account for evaluating the different transformation alternatives and choosing the one that maximizes the quality assessments. The approach has been validated by a set of adaptation scenarios that are intended to maximize different quality factors and therefore apply distinct combinations of metrics.Peer ReviewedPostprint (author's final draft
    corecore