6 research outputs found

    One for All: Neural Joint Modeling of Entities and Events

    Full text link
    The previous work for event extraction has mainly focused on the predictions for event triggers and argument roles, treating entity mentions as being provided by human annotators. This is unrealistic as entity mentions are usually predicted by some existing toolkits whose errors might be propagated to the event trigger and argument role recognition. Few of the recent work has addressed this problem by jointly predicting entity mentions, event triggers and arguments. However, such work is limited to using discrete engineering features to represent contextual information for the individual tasks and their interactions. In this work, we propose a novel model to jointly perform predictions for entity mentions, event triggers and arguments based on the shared hidden representations from deep learning. The experiments demonstrate the benefits of the proposed method, leading to the state-of-the-art performance for event extraction.Comment: Accepted at The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) (Honolulu, Hawaii, USA

    GIANT: Scalable Creation of a Web-scale Ontology

    Full text link
    Understanding what online users may pay attention to is key to content recommendation and search services. These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories. While existing knowledge bases and taxonomies embody a large volume of entities and categories, we argue that they fail to discover properly grained concepts, events and topics in the language style of online population. Neither is a logically structured ontology maintained among these notions. In this paper, we present GIANT, a mechanism to construct a user-centered, web-scale, structured ontology, containing a large number of natural language phrases conforming to user attentions at various granularities, mined from a vast volume of web documents and search click graphs. Various types of edges are also constructed to maintain a hierarchy in the ontology. We present our graph-neural-network-based techniques used in GIANT, and evaluate the proposed methods as compared to a variety of baselines. GIANT has produced the Attention Ontology, which has been deployed in various Tencent applications involving over a billion users. Online A/B testing performed on Tencent QQ Browser shows that Attention Ontology can significantly improve click-through rates in news recommendation.Comment: Accepted as full paper by SIGMOD 202

    Joint Extraction of Events and Entities within a Document Context

    No full text

    A graph-based framework for data retrieved from criminal-related documents

    Get PDF
    A digitalização das empresas e dos serviços tem potenciado o tratamento e análise de um crescente volume de dados provenientes de fontes heterogeneas, com desafios emergentes, nomeadamente ao nível da representação do conhecimento. Também os Órgãos de Polícia Criminal (OPC) enfrentam o mesmo desafio, tendo em conta o volume de dados não estruturados, provenientes de relatórios policiais, sendo analisados manualmente pelo investigadores criminais, consumindo tempo e recursos. Assim, a necessidade de extrair e representar os dados não estruturados existentes em documentos relacionados com o crime, de uma forma automática, permitindo a redução da análise manual efetuada pelos investigadores criminais. Apresenta-se como um desafio para a ciência dos computadores, dando a possibilidade de propor uma alternativa computacional que permita extrair e representar os dados, adaptando ou propondo métodos computacionais novos. Actualmente existem vários métodos computacionais aplicados ao domínio criminal, nomeadamente a identificação e classificação de entidades nomeadas, por exemplo narcóticos, ou a extracção de relações entre entidades relevantes para a investigação criminal. Estes métodos são maioritariamente aplicadas à lingua inglesa, e em Portugal não há muita atenção à investigação nesta área, inviabilizando a sua aplicação no contexto da investigação criminal. Esta tese propõe uma solução integrada para a representação dos dados não estruturados existentes em documentos, usando um conjunto de métodos computacionais: Preprocessamento de Documentos, que agrupa uma tarefa de Extracção, Transformação e Carregamento adaptado aos documentos relacionados com o crime, seguido por um pipeline de Processamento de Linguagem Natural aplicado à lingua portuguesa, para uma análise sintática e semântica dos dados textuais; Método de Extracção de Informação 5W1H que agrupa métodos de Reconhecimento de Entidades Nomeadas, a detecção da função semântica e a extracção de termos criminais; Preenchimento da Base de Dados de Grafos e Enriquecimento, permitindo a representação dos dados obtidos numa base de dados de grafos Neo4j. Globalmente a solução integrada apresenta resultados promissores, cujos resultados foram validados usando protótipos desemvolvidos para o efeito. Demonstrou-se ainda a viabilidade da extracção dos dados não estruturados, a sua interpretação sintática e semântica, bem como a representação na base de dados de grafos; Abstract: The digitalization of companies processes has enhanced the treatment and analysis of a growing volume of data from heterogeneous sources, with emerging challenges, namely those related to knowledge representation. The Criminal Police has similar challenges, considering the amount of unstructured data from police reports manually analyzed by criminal investigators, with the corresponding time and resources. There is a need to automatically extract and represent the unstructured data existing in criminal-related documents and reduce the manual analysis by criminal investigators. Computer science faces a challenge to apply emergent computational models that can be an alternative to extract and represent the data using new or existing methods. A broad set of computational methods have been applied to the criminal domain, such as the identification and classification named-entities (NEs) or extraction of relations between the entities that are relevant for the criminal investigation, like narcotics. However, these methods have mainly been used in the English language. In Portugal, the research on this domain, applying computational methods, lacks related works, making its application in criminal investigation unfeasible. This thesis proposes an integrated solution for the representation of unstructured data retrieved from documents, using a set of computational methods, such as Preprocessing Criminal-Related Documents module. This module is supported by Extraction, Transformation, and Loading tasks. Followed by a Natural Language Processing pipeline applied to the Portuguese language, for syntactic and semantic analysis of textual data. Next, the 5W1H Information Extraction Method combines the Named-Entity Recognition, Semantic Role Labelling, and Criminal Terms Extraction tasks. Finally, the Graph Database Population and Enrichment allows us the representation of data retrieved into a Neo4j graph database. Globally, the framework presents promising results that were validated using prototypes developed for this purpose. In addition, the feasibility of extracting unstructured data, its syntactic and semantic interpretation, and the graph database representation has also been demonstrated
    corecore