1,297 research outputs found

    ASAPP 2.0: Advancing the state-of-the-art of semantic textual similarity for Portuguese

    Get PDF
    Semantic Textual Similarity (STS) aims at computing the proximity of meaning transmitted by two sentences. In 2016, the ASSIN shared task targeted STS in Portuguese and released training and test collections. This paper describes the development of ASAPP, a system that participated in ASSIN, but has been improved since then, and now achieves the best results in this task. ASAPP learns a STS function from a broad range of lexical, syntactic, semantic and distributional features. This paper describes the features used in the current version of ASAPP, and how they are exploited in a regression algorithm to achieve the best published results for ASSIN to date, in both European and Brazilian Portuguese

    Semantic Classification of Scientific Sentence Pair Using Recurrent Neural Network

    Get PDF
    One development of Natural Language Processing is the semantic classification of sentences and documents. The challenge is finding relationships between words and between documents through a computational model. The development of machine learning makes it possible to try out various possibilities that provide classification capabilities. This paper proposes the semantic classification of sentence pairs using Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM). Each couple of sentences is turned into vectors using Word2Vec. Experiments carried out using CBOW and Skip-Gram to get the best combination. The results are obtained that word embedding using CBOW produces better than Skip-Gram, although it is still around 5%. However, CBOW slows slightly at the beginning of iteration but is stable towards convergence. Classification of all six classes, namely Equivalent, Similar, Specific, No Alignment, Related, and Opposite. As a result of the unbalanced data set, the retraining was conducted by eliminating a few classes member from the data set, thus providing an accuracy of 73% for non-training data. The results showed that the Adam model gave a faster convergence at the start of training compared to the SGD model, and AdaDelta, which was built, gave 75% better accuracy with an F1-Score of 67%

    Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

    Full text link
    Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.Comment: Accepted at ACL 202

    Using case-based reasoning to support alternative dispute resolution

    Get PDF
    Springer - Series Advances in Intelligent and Soft Computing, vol. 79Recent trends in communication technologies led to a shift in the already traditional Alternative Dispute Resolution paradigm, giving birth to the Online Dispute Resolution one. In this new paradigm, technologies are used as a way to deliver better, faster and cheaper alternatives to litigation in court. However, the role that technology plays can be even further enhanced through the use of artefacts from the Artificial Intelligence field. In this paper we present UMCourt, an Online Dispute Resolution tool that borrows concepts from the fields of Law and Artificial Intelligence. The system keeps the parties informed about the possible consequences of their litigation if their problems are to be settled in court. Moreover, it makes use of a Case-based Reasoning algorithm that searches for solutions for the litigation considering past known similar cases, as a way to enhance the negotiation process. When parties have access to all this information and are aware of the consequences of their choices, they can take better decisions that encompass all the important aspects of a litigation process.The work described in this paper is included in TIARAC - Telematics and Artificial Intelligence in Alternative Conflict Resolution Project (PTDC/JUR/71354/2006), which is a research project supported by FCT (Science & Technology Foundation), Portugal

    Detecção de Paráfrases na Lı́ngua Portuguesa usando Sentence Embeddings

    Get PDF
    A detecção (ou identificação) de paráfrases é a tarefa de determinar se duas ou mais sentenças de comprimento arbitrário possuem o mesmo significado. Os métodos para resolver esta tarefa com potenciais aplicações em sistemas de Processamento de Linguagem Natural. Este trabalho investiga a combinação de diferentes métodos de representação de sentenças em modelos de linguagem por espaços vetoriais e classificadores lineares para o problema de detecção de paráfrases para a língua portuguesa. Os resultados obtidos nesse trabalho estão aquém daqueles obtidos para a tarefa relacionada de detecção de implicação textual na avaliação ASSIN para a língua portuguesa, porém nesse trabalho investigamos a aplicação das representações vetoriais de sentenças para a detecção de paráfrases, outras características usualmente exploradas em sistemas desse tipo podem trivialmente ser incorporadas ao nosso método para melhorar a performance

    Automatization of incident resolution

    Get PDF
    Incident management is a key IT Service Management sub process in every organization as a way to deal with the current volume of tickets created every year. Currently, the resolution process is still extremely human labor intensive. A large number of incidents are not from a new, never seen before problem, they have already been solved in the past and their respective resolution have been previously stored in an Incident Ticket System. Automation of repeatable tasks in IT is an important element of service management and can have a considerable impact in an organization. Using a large real-world database of incident tickets, this dissertation explores a method to automatically propose a suitable resolution for a new ticket using previous tickets’ resolution texts. At its core, the method uses machine learning, natural language parsing, information retrieval and mining. The proposed method explores machine learning models like SVM, Logistic Regression, some neural networks architecture and more, to predict an incident resolution category for a new ticket and a module to automatically retrieve resolution action phrases from tickets using part-of-speech pattern matching. In the experiments performed, 31% to 41% of the tickets from a test set was considered as solved by the proposed method, which considering the yearly volume of tickets represents a significant amount of manpower and resources that could be saved.A Gestão de incidentes é um subprocesso chave da Gestão de Serviços de TI em todas as organizações como uma forma de lidar com o volume atual de tickets criados todos os anos. Atualmente, o processo de resolução ainda exige muito trabalho humano. Um grande número de incidentes não são de um problema novo, nunca visto antes, eles já foram resolvidos no passado e sua respetiva resolução foi previamente armazenada em um Sistema de Ticket de Incidentes. A automação de tarefas repetíveis em TI é um elemento importante do Gestão de Serviços e pode ter um impacto considerável em uma organização. Usando um grande conjunto de dados reais de tickets de incidentes, esta dissertação explora um método para propor automaticamente uma resolução adequada para um novo ticket usando textos de resolução de tickets anteriores. Em sua essência, o método usa aprendizado de máquina, análise de linguagem natural, recuperação de informações e mineração. O método proposto explora modelos de aprendizagem automática como SVM, Regressão Logística, arquitetura de algumas redes neurais e mais, para prever uma categoria de resolução de incidentes para um novo ticket e um módulo para extrair automaticamente ações de resolução de tickets usando padrões de classes gramaticais. Nas experiências realizados, 31% a 41% dos tickets de um conjunto de testes foram considerados como resolvidos pelo método proposto, que considerando o volume anual de tickets representa uma quantidade significativa de mão de obra e recursos que poderiam ser economizados

    Computational approaches to semantic change (Volume 6)

    Get PDF
    Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans

    Learning to Rank Academic Experts in the DBLP Dataset

    Full text link
    Expert finding is an information retrieval task that is concerned with the search for the most knowledgeable people with respect to a specific topic, and the search is based on documents that describe people's activities. The task involves taking a user query as input and returning a list of people who are sorted by their level of expertise with respect to the user query. Despite recent interest in the area, the current state-of-the-art techniques lack in principled approaches for optimally combining different sources of evidence. This article proposes two frameworks for combining multiple estimators of expertise. These estimators are derived from textual contents, from graph-structure of the citation patterns for the community of experts, and from profile information about the experts. More specifically, this article explores the use of supervised learning to rank methods, as well as rank aggregation approaches, for combing all of the estimators of expertise. Several supervised learning algorithms, which are representative of the pointwise, pairwise and listwise approaches, were tested, and various state-of-the-art data fusion techniques were also explored for the rank aggregation framework. Experiments that were performed on a dataset of academic publications from the Computer Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with arXiv:1302.041
    corecore