7 research outputs found

    Lie-o-matic: using natural language processing to detect contradictory statements

    Get PDF
    A Era da Informação trouxe consigo a digitalização de dados e, consequentemente, um rápido, e com maior alcance, fluxo de informação e produção da mesma. Pessoas como jornalistas têm dificuldade em lidar com a crescente divulgação de dados e em monitorar e aprovar a informação propagada, que poderá estar corrompida (conter mentiras, inconsistências, contradições, etc.). Considerando este problema atual e a constante evolução em técnicas de processamento de linguagem natural e \textit{"machine learning"}, estamos interessados em tirar vantagens desses recentes desenvolvimentos para atacar o caso específico de deteção de contradições em texto. Esta dissertação investiga o efeito de vários conjuntos de dados, de diferentes domínios e tarefas (como contradições em diferentes contextos ou argumentos de suporte e ataque), no desempenho da aprendizagem de um modelo de classificação de aprendizagem supervisionada. Assim, nós abordamos o problema como uma tarefa de classificação binária, afinando uma tarefa de classificação de pares de frases, desenvolvida sob um modelo BERT pré-treinado, para depois executarmos previsões de se dois textos são contraditórios ou não. Estudos em deteção de contradições têm-se focado mais em distinguir antónimos e palavras contrastantes. Tanto quanto é do nosso conhecimento, nenhuma investigação sistemática alguma vez considerou \textit{"transfer learning"} (transferência de conhecimento) para a tarefa de detetar contradições. Para ilustrarmos a nossa ideia, contradições no domínio político foram usadas com caso de estudo. Como estamos a testar transferência de conhecimento, conduzimos experiências usando como domínio de tarefa de origem dados retirados de quatro corpos disponíveis ao público: MultiNLI, US2016, Argumentative Microtext e Argument Annotated Essays. Para o domínio alvo, criamos dois conjuntos de dados contendo pares de contradições provenientes de duas origens diferentes, um artigo online expondo aclamações contraditórias do Donald Trump e o corpo MultiNLI (mas só os exemplos do género governamental). Para avaliar as experiências guiadas, medimos o desempenho da classificação maioritariamente a partir de análises à curva característica de operação (curva ROC) e à curva de Precisão-Abrangência. Os resultados dos estudos respondem à pergunta de estudo de que, de facto, outros conjuntos de dados podem ser usados para melhorar o desempenho da aprendizagem de um modelo de inferência sobre uma tarefa alvo, embora os resultados não serem significantes o suficiente para assegurarmos firmemente a consistência e confiança dos mesmos. Os resultados dão ideias de que tipo de relações entre documentos se deve priorizar caso se recorra a transferência de conhecimento para detetar contradições. Nós concluímos que o tipo de tarefa, o contexto e os padrões de linguagem (marcas linguísticas características do discurso de uma pessoa) têm um maior impacto e, por isso, podem ser uteis quando diferentes dados contêm semelhanças a nível destes três fatores. Não obstante, no nosso estudo enfrentamos limitações, como a falta de robustez no conjunto de dados para teste construído a partir das contradições do Donald Trump, porque não recorremos a anotadores profissionais, e o facto de os resultados de classificação alcançados já serem muito bons apenas usando o conjunto de dados alvo para treino e teste, o que nos deixa com pouca margem para melhorias.The Information age brought the digitization of data, and, consequently, a faster and wider flow and production of information. People, such as journalists, struggle to cope with the increasing data disclosure and to monitor and verify the spread information, that might be corrupted (containing lies, inconsistencies, contradictions, etc.). Considering this current problem and the constant evolution in Natural Language Processing (NLP) techniques and machine learning, we are interested in taking advantage of those recent developments to tackle the specific NLP task of detecting contradictions in text. This dissertation investigates the effect of various datasets, from different domains and tasks (like contradictions in a different context or arguments of support and attack), on the learning performance of a supervised learning classification model for detecting contradictions. Hence, we address the problem as a binary classification task, fine-tuning a sentence-pair classification task, built on top of a pre-trained BERT model, to later run prediction of if two texts are contradictory or not. Literature on contradiction detection has focused almost on separating antonyms and contrasting words. To the best of our knowledge, no systematic investigation has considered transfer learning for the task of contradiction detection. To illustrate this idea, contradictions in a political domain were used as a case study. Since we are testing transfer learning, we conducted experiments using as source task domain data collected from four publicly available corpora: MultiNLI, US2016, Argumentative Microtext, and Argument Annotated Essays. Then, for target domain, we built two datasets containing pairs of contradictions from two different sources, an online article exposing Donald Trump contradictory claims, and MultiNLI corpus (but only instances of government genre). To evaluate the conducted experiments, we measure classification performances mainly through ROC and Precision-Recall curves analysis. The findings from the research answer our research question that, indeed, other datasets can be used to boost an inference model learning performance on a target task, although the results are not too significant to strongly assure the consistency and reliability of our findings. The findings offer insights into what kind of relationship between documents one should focus on when resorting to transfer learning for detection of contradictions. We conclude that the domain's task, context and language patterns (linguistic markers characteristic of a person speech), have a bigger impact and, thus, can be helpful if different data contains similarities in these three factors. Nonetheless, we faced some limitations in our research, such as the lack of robustness in the testing dataset built from Donald Trump contradictions, because of missing professional annotators for that task, and the already great classification results when only using the target domain for training and testing, leading to a small margin for improvements

    Livro de atas do XVI Congresso da Associação Portuguesa de Investigação Operacional

    Get PDF
    Fundação para a Ciência e Tecnologia - FC

    Deep Vacuity : detecção e classificação automática de padrões com risco de conluio em dados públicos de licitações de obras

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2021.A identificação de fraudes e conluios em licitações de obras públicas é uma tarefa man- ual dispendiosa dependente tanto de experiência profissional quanto de profundo conheci- mento técnico e legal. As bases de dados públicas, aliadas a dados de licitações e contratos previamente analisados por peritos criminais altamente capacitados, formaram a base de dados passível de ser analisada para a identificação de atos ilícitos. Neste trabalho é pro- posta uma metodologia para realizar a detecção e classificação automática de padrões de conluio em licitações públicas, utilizando como fontes os dados disponíveis nos principais repositórios oficiais públicos, agregando a utilização de técnicas de reconhecimento de padrões para a realização deste objetivo proposto. Em uma abordagem inicial, obteve-se com sucesso para a formação da base de dados do trabalho um total de 15.132.968 pub- licações da Seção 3 do Diário Oficial da União em formato de texto e 1.907 documentos como referência de indicativo de atividades de conluio (estes disponibilizados por institu- ição parceira) que indicavam risco no processo licitatório. Foram testados modelos lineares clássicos, redes neurais profundas, bottleneck, Bi-LSTM e multicanal com vetorização do texto com TF-IDF e DOC2VEC, e dados estruturados extraídos do texto. O melhor F1- score foi obtido com o modelo passive-aggressive com 93,4% e o modelo bottleneck obteve 93,0% com melhor precisão.Identifying fraud and collusion in public bids is an expensive manual task and de- pendent on professional experience using in-depth technical and legal knowledge. Public databases, allied to bidding and contract data previously analyzed by highly trained crim- inal experts, form the database that can be analyzed for irregularities identification. This work proposes a methodology for automatic detection and classification of collusion pat- terns in public bids text, using data sources available on main public official repositories and adding pattern recognition techniques to achieve a model that detects and classifies this pattern. In an initial approach, a total of 15, 132, 968 publications of the Diario Oficial da União news, Section 3, in text format and 1, 907 documents as a reference for collusion activities were successfully obtained for the formation of the central work database (pro- vided by a partner institution) that indicated risk in the bidding process. Classic linear models, deep neural networks, bottleneck, Bi-LSTM, and multichannel were tested with text vectorization with TF-IDF and DOC2VEC, and structured data extracted from the text. The best F1-score was obtained with a passive-aggressive model with 93.4%, but the bottleneck model obtained 93.0% with better precision

    XIV Colóquio de Outono: Humanidades: novos paradigmas do conhecimento e da investigação

    Get PDF
    The present volume off ers a selection of the papers presented at the XIV Colóquio de Outono organized by the research unit Centro de Estudos Humanísticos (Universidade do Minho) in November 2012, under the global topic Humanities: New Paradigms of Knowledge and Research (Humanidades: Novos Paradigmas do Conhecimento e da Investigação). It has been the main objective of CEHUM, throughout the various Colóquios de Outono organized in just over a decade, to listen carefully to the “noise of the world” and attempt a global interpretation of the signs of the times issuing from the world around us, as vibrant echoes of many social and cultural pressing issues. This volume gathers the majority of the texts presented in the XIV Colóquio de Outono, which the authors generously off ered us for publication, and which will certainly testify of the important debate around the wide topic proposed for this years analysis and discussion. We hope that this new volume may give evidence of our concern, as a Research Centre within the Humanities which operates in a transdisciplinary structure, of the crucial role played by the Humanities in today’s world and the multidisciplinary dialogue that can be fostered by the diff erent research groups that compose it. Throughout the three days of this XIV Colóquio de Outono we had the privilege to listen to and debate the propositions of a vast number of national and international specialists in the manifold fi elds of inquiry here represented, engaging keynote speakers, project advisors, members of research teams and external researchers attached to the various research projects currently running in CEHUM, in the fi elds of literature, linguistics, philosophy, ethics, visual arts, cultural studies, music and performance. Each specifi c fi eld of studies was however never seen isolated, but always embodied in a geo-cultural context and within the scope of a wide variety of critical debates and current theories of knowledge, as a signal of our understanding of the Humanities as a rich and plural territory which engages us all, scholars, researchers, students. For these lively and thought-provoking three days of the conference we wish to thank each and every one of the colleagues present, our distinguished guests, as well as the research members of CEHUM, who so enthusiastically joined in the debate on the proposed topics of analysis. Special thanks to the Board of Directors and the research team leaders of CEHUM for the precious help provided towards the organization and the setting up of this international event. Last but not least, we wish to thank the Instituto de Letras e Ciências Humanas, as well as the research assistants and staff of CEHUM for all the precious logistic support. Finally, our gratitude to our main sponsor, Fundação para a Ciênca e a Tecnologia (FCT), for encouraging and fi nancially supporting this yearly event and the present publication.Fundação para a Ciência e a Tecnologia (FCT)UECOMPETEQRE

    As humanidades e as ciências: disjunções e confluências

    Get PDF
    The present volume offers a selection of the essays presented at the XV Colóquio de Outono organized by the research unit Centro de Estudos Humanísticos (Universidade do Minho) in November 2013, under the global topic Humanities and Sciences: Disjunctions and Confluences. It has been the main objective of CEHUM, throughout the various Autumn Colloquia organized since 1998, to listen carefully to the “noise of the world” and attempt a global interpretation of the signs of the times issuing from the world around us, as vibrant echoes of many social and cultural pressing issues. This volume gathers the majority of the texts presented in the XV Colóquio de Outono, which the authors generously submitted for publication, and which will certainly testify of the important debate around the vast topic proposed for this year’s analysis and discussion. We hope that this new volume may give evidence of our concern, as a Research Centre within the Humanities which operates in a transdisciplinary structure, of the crucial role played by the Humanities in today’s world and the benefits of engaging in this challenging multidisciplinary dialogue. Throughout the three days of this XV Colóquio de Outono we had the privilege to listen to and debate the propositions of a vast number of national and international specialists in the manifold fields of inquiry here represented, engaging keynote speakers, project advisors, members of the different research teams and external researchers attached to the various research projects currently running in CEHUM in the fields of lit- erature, linguistics, philosophy, ethics, visual arts, cultural studies, music and performance. Our objective in this Colloquium was that each specific field of studies here represented never performed per se, but rather substantiated at a crossroads of disciplines, across borders, gaining form within the dialogue with researchers operating in a wide variety of fields, from Computer Science to Mathematics, Medicine and Psychology, Bioarts, Ecology and Ethics. For we believe that the Humanities is a plural territory which only achieves its maximum potential when engaging in a solid and permanent dialogue with other fields of research. Hence, it was not the disjunction between Humanities and Sciences we aimed to highlight throughout this Colloquium, but rather the confluence of research methods, queries, critical reflection and problematization pertaining both to Humanities and Sciences, despite the necessary expertise that identifies each field and the specificity of its target objects of research. For these lively and thought-provoking three days of the conference we wish to thank each and every one of the colleagues present, our distinct guests, as well as the research members of CEHUM, who so enthusiastically joined in the debate on the proposed topics of analysis. Special thanks to the Board of Directors and the research team leaders of CEHUM for the precious help provided towards the organization and the setting up of this international event. Last but not least, we wish to thank the Instituto de Letras e Ciências Humanas, as well as the research assistants and staff of CEHUM for all the precious logistic support. Finally, our gratitude to our main sponsor, Fundação para a Ciênca e a Tecnologia (FCT), for encouraging and financially supporting this yearly event and the present publication.Fundação para a Ciência e a Tecnologia, UE, COMPETE, QRE

    1990-1995 Brock Campus News

    Get PDF
    A compilation of the administration newspaper, Brock Campus News, for the years 1990 through 1995. It had previously been titled The Blue Badger
    corecore