6 research outputs found
One for All: Neural Joint Modeling of Entities and Events
The previous work for event extraction has mainly focused on the predictions
for event triggers and argument roles, treating entity mentions as being
provided by human annotators. This is unrealistic as entity mentions are
usually predicted by some existing toolkits whose errors might be propagated to
the event trigger and argument role recognition. Few of the recent work has
addressed this problem by jointly predicting entity mentions, event triggers
and arguments. However, such work is limited to using discrete engineering
features to represent contextual information for the individual tasks and their
interactions. In this work, we propose a novel model to jointly perform
predictions for entity mentions, event triggers and arguments based on the
shared hidden representations from deep learning. The experiments demonstrate
the benefits of the proposed method, leading to the state-of-the-art
performance for event extraction.Comment: Accepted at The Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19) (Honolulu, Hawaii, USA
GIANT: Scalable Creation of a Web-scale Ontology
Understanding what online users may pay attention to is key to content
recommendation and search services. These services will benefit from a highly
structured and web-scale ontology of entities, concepts, events, topics and
categories. While existing knowledge bases and taxonomies embody a large volume
of entities and categories, we argue that they fail to discover properly
grained concepts, events and topics in the language style of online population.
Neither is a logically structured ontology maintained among these notions. In
this paper, we present GIANT, a mechanism to construct a user-centered,
web-scale, structured ontology, containing a large number of natural language
phrases conforming to user attentions at various granularities, mined from a
vast volume of web documents and search click graphs. Various types of edges
are also constructed to maintain a hierarchy in the ontology. We present our
graph-neural-network-based techniques used in GIANT, and evaluate the proposed
methods as compared to a variety of baselines. GIANT has produced the Attention
Ontology, which has been deployed in various Tencent applications involving
over a billion users. Online A/B testing performed on Tencent QQ Browser shows
that Attention Ontology can significantly improve click-through rates in news
recommendation.Comment: Accepted as full paper by SIGMOD 202
A graph-based framework for data retrieved from criminal-related documents
A digitalização das empresas e dos serviços tem potenciado o tratamento e análise de um crescente volume
de dados provenientes de fontes heterogeneas, com desafios emergentes, nomeadamente ao nível da representação
do conhecimento. Também os Órgãos de Polícia Criminal (OPC) enfrentam o mesmo desafio,
tendo em conta o volume de dados não estruturados, provenientes de relatórios policiais, sendo analisados
manualmente pelo investigadores criminais, consumindo tempo e recursos.
Assim, a necessidade de extrair e representar os dados não estruturados existentes em documentos relacionados
com o crime, de uma forma automática, permitindo a redução da análise manual efetuada pelos
investigadores criminais. Apresenta-se como um desafio para a ciência dos computadores, dando a possibilidade
de propor uma alternativa computacional que permita extrair e representar os dados, adaptando
ou propondo métodos computacionais novos.
Actualmente existem vários métodos computacionais aplicados ao domínio criminal, nomeadamente a identificação
e classificação de entidades nomeadas, por exemplo narcóticos, ou a extracção de relações entre
entidades relevantes para a investigação criminal. Estes métodos são maioritariamente aplicadas à lingua
inglesa, e em Portugal não há muita atenção à investigação nesta área, inviabilizando a sua aplicação no
contexto da investigação criminal.
Esta tese propõe uma solução integrada para a representação dos dados não estruturados existentes em
documentos, usando um conjunto de métodos computacionais: Preprocessamento de Documentos, que
agrupa uma tarefa de Extracção, Transformação e Carregamento adaptado aos documentos relacionados
com o crime, seguido por um pipeline de Processamento de Linguagem Natural aplicado à lingua portuguesa,
para uma análise sintática e semântica dos dados textuais; Método de Extracção de Informação 5W1H
que agrupa métodos de Reconhecimento de Entidades Nomeadas, a detecção da função semântica e a
extracção de termos criminais; Preenchimento da Base de Dados de Grafos e Enriquecimento, permitindo
a representação dos dados obtidos numa base de dados de grafos Neo4j. Globalmente a solução integrada apresenta resultados promissores, cujos resultados foram validados usando
protótipos desemvolvidos para o efeito. Demonstrou-se ainda a viabilidade da extracção dos dados não
estruturados, a sua interpretação sintática e semântica, bem como a representação na base de dados de
grafos; Abstract:
The digitalization of companies processes has enhanced the treatment and analysis of a growing volume
of data from heterogeneous sources, with emerging challenges, namely those related to knowledge representation.
The Criminal Police has similar challenges, considering the amount of unstructured data from
police reports manually analyzed by criminal investigators, with the corresponding time and resources.
There is a need to automatically extract and represent the unstructured data existing in criminal-related
documents and reduce the manual analysis by criminal investigators. Computer science faces a challenge
to apply emergent computational models that can be an alternative to extract and represent the data using
new or existing methods.
A broad set of computational methods have been applied to the criminal domain, such as the identification
and classification named-entities (NEs) or extraction of relations between the entities that are relevant for
the criminal investigation, like narcotics. However, these methods have mainly been used in the English
language. In Portugal, the research on this domain, applying computational methods, lacks related works,
making its application in criminal investigation unfeasible.
This thesis proposes an integrated solution for the representation of unstructured data retrieved from
documents, using a set of computational methods, such as Preprocessing Criminal-Related Documents
module. This module is supported by Extraction, Transformation, and Loading tasks. Followed by a
Natural Language Processing pipeline applied to the Portuguese language, for syntactic and semantic
analysis of textual data. Next, the 5W1H Information Extraction Method combines the Named-Entity
Recognition, Semantic Role Labelling, and Criminal Terms Extraction tasks. Finally, the Graph Database
Population and Enrichment allows us the representation of data retrieved into a Neo4j graph database.
Globally, the framework presents promising results that were validated using prototypes developed for this
purpose. In addition, the feasibility of extracting unstructured data, its syntactic and semantic interpretation,
and the graph database representation has also been demonstrated