7 research outputs found
On Term Selection Techniques for Patent Prior Art Search
A patent is a set of exclusive rights granted to an inventor to
protect his invention for
a limited period of time. Patent prior art search involves
finding previously granted
patents, scientific articles, product descriptions, or any other
published work that
may be relevant to a new patent application. Many well-known
information retrieval
(IR) techniques (e.g., typical query expansion methods), which
are proven effective
for ad hoc search, are unsuccessful for patent prior art search.
In this thesis, we
mainly investigate the reasons that generic IR techniques are not
effective for prior
art search on the CLEF-IP test collection. First, we analyse the
errors caused due to
data curation and experimental settings like applying
International Patent Classification
codes assigned to the patent topics to filter the search results.
Then, we investigate
the influence of term selection on retrieval performance on the
CLEF-IP prior art
test collection, starting with the description section of the
reference patent and using
language models (LM) and BM25 scoring functions. We find that an
oracular relevance
feedback system, which extracts terms from the judged relevant
documents
far outperforms the baseline (i.e., 0.11 vs. 0.48) and performs
twice as well on mean
average precision (MAP) as the best participant in CLEF-IP 2010
(i.e., 0.22 vs. 0.48).
We find a very clear term selection value threshold for use when
choosing terms. We
also notice that most of the useful feedback terms are actually
present in the original
query and hypothesise that the baseline system can be
substantially improved by removing
negative query terms. We try four simple automated approaches to
identify
negative terms for query reduction but we are unable to improve
on the baseline
performance with any of them. However, we show that a simple,
minimal feedback
interactive approach, where terms are selected from only the
first retrieved relevant
document outperforms the best result from CLEF-IP 2010,
suggesting the promise of
interactive methods for term selection in patent prior art
search
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Community’s Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by Consellería
de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
Tiedon eristäminen ja visualisointi lastensuojelun asiakaskertomuksista
Tässä tutkielmassa sovelletaan kieliteknologiaa ja tiedon eristämistä lastensuojelun asiakaskertomuksista koostuvaan aineistoon. Tarkoituksena on tuottaa asiakaskertomuksista automaattisesti erilaisia visualisointeja, kuten aikajanoja ja graafeja, jotta tapauksista saataisiin nopeasti yleiskuva. Visualisointien tuottamista varten aineistossa esiintyvien henkilöiden nimien esiintymät poimitaan kirjauksista ja näiden perusteella päätellään automaattisesti tapauksen henkilöt ja heidän kokonimensä. Kokonimien osittaiset esiintymät liitetään tapauksen henkilöihin ja henkilöille pyritään tunnistamaan myös heidän roolinsa tapauksessa. Asiakaskertomusten arkaluonteisuudesta johtuen työssä käytetty aineisto anonymisoidaan automaattisesti poistamalla sieltä henkilötiedot sekä korvaamalla henkilöiden nimet johdonmukaisesti uusilla.
Asiasanat:visualisointi, tiedon eristäminen, luonnollisen kielen käsittel