1,160 research outputs found
Semantic and pragmatic characterization of learning objects
Tese de doutoramento. Engenharia Informática. Universidade do Porto. Faculdade de Engenharia. 201
Pretrained Transformers for Text Ranking: BERT and Beyond
The goal of text ranking is to generate an ordered list of texts retrieved
from a corpus in response to a query. Although the most common formulation of
text ranking is search, instances of the task can also be found in many natural
language processing applications. This survey provides an overview of text
ranking with neural network architectures known as transformers, of which BERT
is the best-known example. The combination of transformers and self-supervised
pretraining has been responsible for a paradigm shift in natural language
processing (NLP), information retrieval (IR), and beyond. In this survey, we
provide a synthesis of existing work as a single point of entry for
practitioners who wish to gain a better understanding of how to apply
transformers to text ranking problems and researchers who wish to pursue work
in this area. We cover a wide range of modern techniques, grouped into two
high-level categories: transformer models that perform reranking in multi-stage
architectures and dense retrieval techniques that perform ranking directly.
There are two themes that pervade our survey: techniques for handling long
documents, beyond typical sentence-by-sentence processing in NLP, and
techniques for addressing the tradeoff between effectiveness (i.e., result
quality) and efficiency (e.g., query latency, model and index size). Although
transformer architectures and pretraining techniques are recent innovations,
many aspects of how they are applied to text ranking are relatively well
understood and represent mature techniques. However, there remain many open
research questions, and thus in addition to laying out the foundations of
pretrained transformers for text ranking, this survey also attempts to
prognosticate where the field is heading
Recuperação de informação multimodal em repositórios de imagem médica
The proliferation of digital medical imaging modalities in hospitals and other
diagnostic facilities has created huge repositories of valuable data, often
not fully explored. Moreover, the past few years show a growing trend
of data production. As such, studying new ways to index, process and
retrieve medical images becomes an important subject to be addressed by
the wider community of radiologists, scientists and engineers. Content-based
image retrieval, which encompasses various methods, can exploit the visual
information of a medical imaging archive, and is known to be beneficial to
practitioners and researchers. However, the integration of the latest systems
for medical image retrieval into clinical workflows is still rare, and their
effectiveness still show room for improvement.
This thesis proposes solutions and methods for multimodal information
retrieval, in the context of medical imaging repositories. The major
contributions are a search engine for medical imaging studies supporting
multimodal queries in an extensible archive; a framework for automated
labeling of medical images for content discovery; and an assessment and
proposal of feature learning techniques for concept detection from medical
images, exhibiting greater potential than feature extraction algorithms that
were pertinently used in similar tasks. These contributions, each in their
own dimension, seek to narrow the scientific and technical gap towards
the development and adoption of novel multimodal medical image retrieval
systems, to ultimately become part of the workflows of medical practitioners,
teachers, and researchers in healthcare.A proliferação de modalidades de imagem médica digital, em hospitais,
clínicas e outros centros de diagnóstico, levou à criação de enormes
repositórios de dados, frequentemente não explorados na sua totalidade.
Além disso, os últimos anos revelam, claramente, uma tendência para o
crescimento da produção de dados. Portanto, torna-se importante estudar
novas maneiras de indexar, processar e recuperar imagens médicas, por
parte da comunidade alargada de radiologistas, cientistas e engenheiros. A
recuperação de imagens baseada em conteúdo, que envolve uma grande
variedade de métodos, permite a exploração da informação visual num
arquivo de imagem médica, o que traz benefícios para os médicos e
investigadores. Contudo, a integração destas soluções nos fluxos de trabalho
é ainda rara e a eficácia dos mais recentes sistemas de recuperação de
imagem médica pode ser melhorada.
A presente tese propõe soluções e métodos para recuperação de informação
multimodal, no contexto de repositórios de imagem médica. As contribuições
principais são as seguintes: um motor de pesquisa para estudos de imagem
médica com suporte a pesquisas multimodais num arquivo extensível; uma
estrutura para a anotação automática de imagens; e uma avaliação e
proposta de técnicas de representation learning para deteção automática de
conceitos em imagens médicas, exibindo maior potencial do que as técnicas
de extração de features visuais outrora pertinentes em tarefas semelhantes.
Estas contribuições procuram reduzir as dificuldades técnicas e científicas
para o desenvolvimento e adoção de sistemas modernos de recuperação de
imagem médica multimodal, de modo a que estes façam finalmente parte
das ferramentas típicas dos profissionais, professores e investigadores da área
da saúde.Programa Doutoral em Informátic
Recommended from our members
Robust Algorithms for Clustering with Applications to Data Integration
A growing number of data-based applications are used for decision-making that have far-reaching consequences and significant societal impact. Entity resolution, community detection and taxonomy construction are some of the building blocks of these applications and for these methods, clustering is the fundamental underlying concept. Therefore, the use of accurate, robust and scalable methods for clustering cannot be overstated. We tackle the various facets of clustering with a multi-pronged approach described below.
1. While identification of clusters that refer to different entities is challenging for automated strategies, it is relatively easy for humans. We study the robustness of clustering methods that leverage supervision through an oracle i.e an abstraction of crowdsourcing. Additionally, we focus on scalability to handle web-scale datasets.
2. In community detection applications, a common setback in evaluation of the quality of clustering techniques is the lack of ground truth data. We propose a generative model that considers dependent edge formation and devise techniques for efficient cluster recovery
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
- …