201 research outputs found
Utilização de dados estruturados na resposta a perguntas relacionadas com saúde
The current standard way of searching for information is through the usage
of some kind of search engine. Even though there has been progress, it
still is mainly based on the retrieval of a list of documents in which the
words you searched for appear. Since the users goal is to find an answer to
a question, having to look through multiple documents hoping that one of
them have the information they are looking for is not very efficient.
The aim of this thesis is to improve that process of searching for information,
in this case of medical knowledge in two different ways, the first one
is replacing the usual keywords used in search engines for something that
is more natural to humans, a question in its natural form. The second one
is to make use of the additional information that is present in a question
format to provide the user an answer for that same question instead of a
list of documents where those keywords are present.
Since social media are the place where people replace the queries used on
a search engine for questions that are usually answered by humans, it seems
the natural place to look for the questions that we aim to provide with
automatic answers. The first step to provide an answer to those questions
will be to classify them in order to find what kind of information should be
present in its answer. The second step is to identify the keywords that would
be present if this was to be searched through the currently standard way.
Having the keywords identified and knowing what kind of information the
question aims to retrieve, it is now possible to map it into a query format
and retrieve the information needed to provide an answer.Atualmente a forma mais comum de procurar informação é através da utilização
de um motor de busca. Apesar de haver progresso os seus resultados
continuam a ser maioritariamente baseados na devolução de uma lista de
documentos onde estão presentes as palavras utilizadas na pesquisa, tendo
o utilizador posteriormente que percorrer um conjunto dos documentos apresentados
na esperança de obter a informação que procura. Para além
de ser uma forma menos natural de procurar informação também é menos
eficiente.
O objetivo para esta tese é melhorar esse processo de procura de informação,
sendo neste caso o foco a área da saúde. Estas melhorias aconteceriam de
duas formas diferentes, sendo a primeira a substituição da query normalmente
utilizada em motores de busca, por algo que nos é mais natural - uma
pergunta. E a segunda seria aproveitar a informação adicional a que temos
acesso apenas no formato de pergunta, para fornecer os dados necessários
à sua resposta em vez de uma lista de documentos onde um conjunto de
palavras-chave estão presentes.
Sendo as redes sociais o local onde a busca por informação acontece através
da utilização de perguntas, em substituição do que seria normal num motor
de busca, pelo facto de a resposta nestas plataformas ser normalmente
respondida por humanos e não máquinas. Parece assim ser o local natural
para a recolha de perguntas para as quais temos o objetivo de fornecer uma
ferramenta para a obtenção automática de uma resposta. O primeiro passo
para ser possível fornecer esta resposta será a classificação das perguntas em
diferentes tipos, tornando assim possível identificar qual a informação que se
pretende obter. O segundo passo será identificar e categorizar as palavras
de contexto biomédico presentes no texto fornecido, que seriam aquelas
utilizadas caso a procura estivesse a ser feita utilizando as ferramentas convencionais.
Tendo as palavras-chave sido identificadas e sabendo qual o tipo
de informação que deverá estar presente na sua resposta. É agora possível
mapear esta informação para um formato conhecido pelos computadores
(query) e assim obter a informação pretendida.Mestrado em Engenharia Informátic
Recommended from our members
Semantic chunking
Long sentences pose a challenge for natural language processing (NLP) applications. They are associated with a complex information structure leading to increased requirements for processing resources. Although the issue is present in many areas of research, there is little uniformity in the solutions used by research communities dedicated to individual NLP applications. Different aspects of the problem are addressed by different tasks, such as sentence simplification or shallow chunking.
The main contribution of this thesis is the introduction of the task of semantic chunking as a general approach to reducing the cost of processing long sentences. The goal of semantic chunking is to find semantically contained fragments of a sentence representation that can be processed independently and recombined without loss of information. We anchor its principles in established concepts of semantic theory, in particular event and situation semantics. Most of the experiments in this thesis focus on semantic chunking defined on complex semantic representations in Dependency Minimal Recursion Semantics (DMRS),
but we also demonstrate that the task can be performed on sentence strings. We present three chunking models: a) rule-based proof-of-concept DMRS chunking system; b) a semi-supervised sequence labelling neural model for surface semantic chunking; c) a system capable of finding semantic chunk boundaries based on the inherent structure of DMRS graphs, generalisable in the form of descriptive templates. We show how semantic chunking can be applied within a divide-and-conquer processing paradigm, using as an example the task of realization from DMRS. The application of semantic chunking yields noticeable efficiency gains without decreasing the quality of results
Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN
Different types of sentences express sentiment in very different ways. Traditional sentence-level sentiment classification research focuses on one-technique-fits-all solution or only centers on one special type of sentences. In this paper, we propose a divide-and-conquer approach which first classifies sentences into different types, then performs sentiment analysis separately on sentences from each type. Specifically, we find that sentences tend to be more complex if they contain more sentiment targets. Thus, we propose to first apply a neural network based sequence model to classify opinionated sentences into three types according to the number of targets appeared in a sentence. Each group of sentences is then fed into a one-dimensional convolutional neural network separately for sentiment classification. Our approach has been evaluated on four sentiment classification datasets and compared with a wide range of baselines. Experimental results show that: (1) sentence type classification can improve the performance of sentence-level sentiment analysis; (2) the proposed approach achieves state-of-the-art results on several benchmarking datasets
Proceedings
Proceedings of the Workshop on Annotation and
Exploitation of Parallel Corpora AEPC 2010.
Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk.
NEALT Proceedings Series, Vol. 10 (2010), 98 pages.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15893
Recommended from our members
Proceedings of QG2010: The Third Workshop on Question Generation
These are the peer-reviewed proceedings of "QG2010, The Third Workshop on Question Generation". The workshop included a special track for "QGSTEC2010: The First Question Generation Shared Task and Evaluation Challenge".
QG2010 was held as part of The Tenth International Conference on Intelligent Tutoring Systems (ITS2010)
- …