Search CORE

224 research outputs found

Query refinement for patent prior art search

Author: Crestani Fabio
Landoni Monica
Mahdabi Parvaz
Publication venue
Publication date: 18/09/2014
Field of study

A patent is a contract between the inventor and the state, granting a limited time period to the inventor to exploit his invention. In exchange, the inventor must put a detailed description of his invention in the public domain. Patents can encourage innovation and economic growth but at the time of economic crisis patents can hamper such growth. The long duration of the application process is a big obstacle that needs to be addressed to maximize the benefit of patents on innovation and economy. This time can be significantly improved by changing the way we search the patent and non-patent literature.Despite the recent advancement of general information retrieval and the revolution of Web Search engines, there is still a huge gap between the emerging technologies from the research labs and adapted by major Internet search engines, and the systems which are in use by the patent search communities.In this thesis we investigate the problem of patent prior art search in patent retrieval with the goal of finding documents which describe the idea of a query patent. A query patent is a full patent application composed of hundreds of terms which does not represent a single focused information need. Other relevance evidences (e.g. classification tags, and bibliographical data) provide additional details about the underlying information need of the query patent. The first goal of this thesis is to estimate a uni-gram query model from the textual fields of a query patent. We then improve the initial query representation using noun phrases extracted from the query patent. We show that expansion in a query-dependent manner is useful.The second contribution of this thesis is to address the term mismatch problem from a query formulation point of view by integrating multiple relevance evidences associated with the query patent. To do this, we enhance the initial representation of the query with the term distribution of the community of inventors related to the topic of the query patent. We then build a lexicon using classification tags and show that query expansion using this lexicon and considering proximity information (between query and expansion terms) can improve the retrieval performance. We perform an empirical evaluation of our proposed models on two patent datasets. The experimental results show that our proposed models can achieve significantly better results than the baseline and other enhanced models

Music Synchronization, Audio Matching, Pattern Detection, and User Interfaces for a Digital Music Library System

Author: Kriesel Verena
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Over the last two decades, growing efforts to digitize our cultural heritage could be observed. Most of these digitization initiatives pursuit either one or both of the following goals: to conserve the documents - especially those threatened by decay - and to provide remote access on a grand scale. For music documents these trends are observable as well, and by now several digital music libraries are in existence. An important characteristic of these music libraries is an inherent multimodality resulting from the large variety of available digital music representations, such as scanned score, symbolic score, audio recordings, and videos. In addition, for each piece of music there exists not only one document of each type, but many. Considering and exploiting this multimodality and multiplicity, the DFG-funded digital library initiative PROBADO MUSIC aimed at developing a novel user-friendly interface for content-based retrieval, document access, navigation, and browsing in large music collections. The implementation of such a front end requires the multimodal linking and indexing of the music documents during preprocessing. As the considered music collections can be very large, the automated or at least semi-automated calculation of these structures would be recommendable. The field of music information retrieval (MIR) is particularly concerned with the development of suitable procedures, and it was the goal of PROBADO MUSIC to include existing and newly developed MIR techniques to realize the envisioned digital music library system. In this context, the present thesis discusses the following three MIR tasks: music synchronization, audio matching, and pattern detection. We are going to identify particular issues in these fields and provide algorithmic solutions as well as prototypical implementations. In Music synchronization, for each position in one representation of a piece of music the corresponding position in another representation is calculated. This thesis focuses on the task of aligning scanned score pages of orchestral music with audio recordings. Here, a previously unconsidered piece of information is the textual specification of transposing instruments provided in the score. Our evaluations show that the neglect of such information can result in a measurable loss of synchronization accuracy. Therefore, we propose an OCR-based approach for detecting and interpreting the transposition information in orchestral scores. For a given audio snippet, audio matching methods automatically calculate all musically similar excerpts within a collection of audio recordings. In this context, subsequence dynamic time warping (SSDTW) is a well-established approach as it allows for local and global tempo variations between the query and the retrieved matches. Moving to real-life digital music libraries with larger audio collections, however, the quadratic runtime of SSDTW results in untenable response times. To improve on the response time, this thesis introduces a novel index-based approach to SSDTW-based audio matching. We combine the idea of inverted file lists introduced by Kurth and Müller (Efficient index-based audio matching, 2008) with the shingling techniques often used in the audio identification scenario. In pattern detection, all repeating patterns within one piece of music are determined. Usually, pattern detection operates on symbolic score documents and is often used in the context of computer-aided motivic analysis. Envisioned as a new feature of the PROBADO MUSIC system, this thesis proposes a string-based approach to pattern detection and a novel interactive front end for result visualization and analysis

Development of deep learning applications for the automated extraction of chemical information from scientific literature

Author: Brinkhaus Otto
Publication venue
Publication date: 01/01/2023
Field of study

This dissertation focuses on developing deep learning applications for extracting chemical information from scientific literature, particularly targeting the automated recognition of molecular structures in images. DECIMER Segmentation, a novel application, employs a Mask Region-based Convolutional Neural Network (MRCNN) model to segment chemical structures in documents, aided by a mask expansion algorithm, marking a significant advancement in processing chemical literature. The Optical Chemical Structure Recognition (OCSR) tool DECIMER Image Transformer uses an encoder-decoder architecture to convert chemical structure depictions into the machine-readable SMILES format. The model has been trained on over 450 million pairs of images and SMILES representations. Its ability to interpret various depiction styles, including hand-drawn structures, sets a new standard in OCSR. To artificially generate large and diverse OCSR training datasets using multiple cheminformatics toolkits, RanDepict was developed. The diversification of training data ensures robust model generalisation across different chemical structure depictions. A unique dataset of hand-drawn molecule images was created to evaluate the model's performance in interpreting these challenging depictions. This dataset further contributes to the understanding of automated structure recognition from diverse styles. The integration of these technologies led to the creation of DECIMER.ai, an open-source web application that combines segmentation and interpretation tools, allowing users to extract and process chemical information from literature efficiently. The work concludes with a discussion on the significance of open data in advancing molecular informatics, highlighting the potential to broader chemical research domains. By adhering to FAIR data standards and open-source principles, the tools developed for this dissertation are designed for adaptability and future development within the community

Toward higher effectiveness for recall-oriented information retrieval: A patent retrieval case study

Author: Magdy Walid
Publication venue: Dublin City University. School of Computing
Publication date: 01/03/2012
Field of study

Research in information retrieval (IR) has largely been directed towards tasks requiring high precision. Recently, other IR applications which can be described as recall-oriented IR tasks have received increased attention in the IR research domain. Prominent among these IR applications are patent search and legal search, where users are typically ready to check hundreds or possibly thousands of documents in order to find any possible relevant document. The main concerns in this kind of application are very different from those in standard precision-oriented IR tasks, where users tend to be focused on finding an answer to their information need that can typically be addressed by one or two relevant documents. For precision-oriented tasks, mean average precision continues to be used as the primary evaluation metric for almost all IR applications. For recall-oriented IR applications the nature of the search task, including objectives, users, queries, and document collections, is different from that of standard precision-oriented search tasks. In this research study, two dimensions in IR are explored for the recall-oriented patent search task. The study includes IR system evaluation and multilingual IR for patent search. In each of these dimensions, current IR techniques are studied and novel techniques developed especially for this kind of recall-oriented IR application are proposed and investigated experimentally in the context of patent retrieval. The techniques developed in this thesis provide a significant contribution toward evaluating the effectiveness of recall-oriented IR in general and particularly patent search, and improving the efficiency of multilingual search for this kind of task

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Functionality Analysis and Information Retrieval in Electronic Document Management Systems

Author
Publication venue
Publication date
Field of study

A document management system (DMS) is nowadays one of the most impactful organisational tools that an enterprise may be dependent on. De Angeli Prodotti (DAP), a manufacturer for overhead conductors, wanted to implement an opensource DMS with functionalities that best fit their needs. We took this opportunity to also test and evaluate the state of information retrieval capabilities of electronic DMSs

Padua Thesis and Dissertation Archive

Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed

Author: Eisinger Daniel
Publication venue
Publication date: 07/10/2013
Field of study

The patent domain is a very important source of scientific information that is currently not used to its full potential. Searching for relevant patents is a complex task because the number of existing patents is very high and grows quickly, patent text is extremely complicated, and standard vocabulary is not used consistently or doesn’t even exist. As a consequence, pure keyword searches often fail to return satisfying results in the patent domain. Major companies employ patent professionals who are able to search patents effectively, but even they have to invest a lot of time and effort into their search. Academic scientists on the other hand do not have access to such resources and therefore often do not search patents at all, but they risk missing up-to-date information that will not be published in scientific publications until much later, if it is published at all. Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Similarly, professional patent searches expand beyond keywords by including class codes from various patent classification systems. However, classification-based searches can only be performed effectively if the user has very detailed knowledge of the system, which is usually not the case for academic scientists. Consequently, we investigated methods to automatically identify relevant classes that can then be suggested to the user to expand their query. Since every patent is assigned at least one class code, it should be possible for these assignments to be used in a similar way as the MeSH annotations in PubMed. In order to develop a system for this task, it is necessary to have a good understanding of the properties of both classification systems. In order to gain such knowledge, we perform an in-depth comparative analysis of MeSH and the main patent classification system, the International Patent Classification (IPC). We investigate the hierarchical structures as well as the properties of the terms/classes respectively, and we compare the assignment of IPC codes to patents with the annotation of PubMed documents with MeSH terms. Our analysis shows that the hierarchies are structurally similar, but terms and annotations differ significantly. The most important differences concern the considerably higher complexity of the IPC class definitions compared to MeSH terms and the far lower number of class assignments to the average patent compared to the number of MeSH terms assigned to PubMed documents. As a result of these differences, problems are caused both for unexperienced patent searchers and professionals. On the one hand, the complex term system makes it very difficult for members of the former group to find any IPC classes that are relevant for their search task. On the other hand, the low number of IPC classes per patent points to incomplete class assignments by the patent office, therefore limiting the recall of the classification-based searches that are frequently performed by the latter group. We approach these problems from two directions: First, by automatically assigning additional patent classes to make up for the missing assignments, and second, by automatically retrieving relevant keywords and classes that are proposed to the user so they can expand their initial search. For the automated assignment of additional patent classes, we adapt an approach to the patent domain that was successfully used for the assignment of MeSH terms to PubMed abstracts. Each document is assigned a set of IPC classes by a large set of binary Maximum-Entropy classifiers. Our evaluation shows good performance by individual classifiers (precision/recall between 0:84 and 0:90), making the retrieval of additional relevant documents for specific IPC classes feasible. The assignment of additional classes to specific documents is more problematic, since the precision of our classifiers is not high enough to avoid false positives. However, we propose filtering methods that can help solve this problem. For the guided patent search, we demonstrate various methods to expand a user’s initial query. Our methods use both keywords and class codes that the user enters to retrieve additional relevant keywords and classes that are then suggested to the user. These additional query components are extracted from different sources such as patent text, IPC definitions, external vocabularies and co-occurrence data. The suggested expansions can help unexperienced users refine their queries with relevant IPC classes, and professionals can compose their complete query faster and more easily. We also present GoPatents, a patent retrieval prototype that incorporates some of our proposals and makes faceted browsing of a patent corpus possible

Technische Universität Dresden: Qucosa

Who wrote this scientific text?

Author: Labbé Cyril
Labbé Dominique
Publication venue: HAL CCSD
Publication date: 02/06/2014
Field of study

The IEEE bibliographic database contains a number of proven duplications with indication of the original paper(s) copied. This corpus is used to test a method for the detection of hidden intertextuality (commonly named "plagiarism"). The intertextual distance, combined with the sliding window and with various classification techniques, identifies these duplications with a very low risk of error. These experiments also show that several factors blur the identity of the scientific author, including variable group authorship and the high levels of intertextuality accepted, and sometimes desired, in scientific papers on the same topic

Hal - Université Grenoble Alpes

The Main Belt Comets and ice in the Solar System

Author: A Bieler
A Colaprete
A Decock
A Guilbert-Lepoutre
A Guilbert-Lepoutre
A Guilbert-Lepoutre
A Morbidelli
A Waszczak
AE Saal
AF Cheng
AJ Brown
AJ McKay
AJ McKay
AL Cochran
AL Lane
Alan Fitzsimmons
AM Gilbert
AM Gilbert
AR Vasavada
AS Rivkin
AS Rivkin
AS Rivkin
Aurelie Guilbert-Lepoutre
B Carry
B Novaković
B Novaković
B Yang
B Yang
Bin Yang
BJR Davidsson
C de Bergh
C Dumas
C Opitom
C Sagan
C Snodgrass
C Snodgrass
C Snodgrass
C Tubiana
CA Trujillo
CA Trujillo
CJ Hansen
CM Dalle Ore
CM Pieters
Colin Snodgrass
Cyrielle Opitom
D Bockelée-Morvan
D Bockelée-Morvan
D Bockelée-Morvan
D Bockelée-Morvan
D Bodewits
D Bodewits
D Brownlee
D Despois
D Jewitt
D Jewitt
D Jewitt
D Jewitt
D Jewitt
D Jewitt
D Nesvorný
D Prialnik
D Prialnik
D Prialnik
D Takir
D Takir
DA Paige
DA Paige
DE Trilling
DG Schleicher
DG Schleicher
DJ Lawrence
DJ Scheeres
DP Cruikshank
E Jehin
E Jehin
EL Schaller
EM MacLennan
Emmanuel Jehin
F Moreno
F Moreno
F Moreno
F Moreno
F Moreno
F Moreno
F Nimmo
F Robert
F Robert
F Vilas
F Vilas
F Vilas
FJ Martín-Torres
FJ Pozuelos
G Cessateur
G Picardi
GA Neumann
GL Villanueva
H Balsiger
H Campins
H Genda
H Kawakita
H Kosai
H Rauer
H Sierks
H Spinrad
HA Weaver
HA Weaver
HB Niemann
Henry H. Hsieh
HF Levison
HF Levison
HF Levison
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HH Hsieh
HU Keller
I Bertini
I Ferrín
I Richter
I Toth
I Toth
J Agarwal
J Cernicharo
J Crovisier
J Crovisier
J Crovisier
J Crovisier
J Haruyama
J Horner
J Lasue
J Li
J Licandro
J Licandro
J Licandro
JA Fernández
JA Fernández
JB McPhate
JC Armstrong
JC Castillo-Rogez
Jessica Agarwal
JI Moses
JK Davies
JK Davies
JK Harmon
JL Bertaux
JM Sunshine
JM Sunshine
JN Winn
JP Bibring
JP Bibring
JS Lewis
JTT Mäkinen
JX Luu
K Altwegg
K Chiu
K de Kleer
KD Hargrove
KH Baines
KJ Meech
KJ Walsh
KK Knaell
L Colangeli
L Dones
L Dones
L Haser
L Kresak
L O’Rourke
L Paganini
LJ Hallis
LM Feaga
M de Val-Borro
M de Val-Borro
M de Val-Borro
M Drahus
M Festou
M Florczak
M Ishiguro
M Ishiguro
M Küppers
M Manga
M Marsset
M Mayor
M Podolak
M Pätzold
M Rubin
M Wells
MA Barucci
MA Barucci
MA Cordiner
MA Cordiner
MA Feierberg
MA Slade
Man-To Hui
Matthew M. Knight
MC Malin
ME Brown
ME Brown
MF A’Hearn
MF A’Hearn
MF A’Hearn
MF A’Hearn
MH Carr
MH Wong
Michael Combi
Michael S. P. Kelley
Miguel de Val-Borro
MJ Drake
MJ Gaffey
MJ Mumma
MJ Mumma
MM Knight
MR Combi
MR Combi
MR Combi
MR Combi
MR Combi
MSP Kelley
MT Capria
MT Hui
MT Hui
MT Hui
MT Hui
N Biver
N Biver
N Biver
N Biver
N Biver
N Haghighipour
N Peixinho
N Schorghofer
N Schorghofer
N Thomas
NG Barlow
P Hartogh
P Hartogh
P Rousselot
P Swings
PC Thomas
PC Thomas
PD Feldman
PD Feldman
PD Nicholson
PD Spudis
PM Doyle
PO Hayne
R Gomes
R Seu
R Stevenson
RE Lupu
RF Mueller
RG Burns
RM Killen
Roberto Orosei
RT Clancy
S Fornasier
S Gulkis
S Kendrew
S Protopapa
S Sonnett
SF Newman
SJ Bus
SJ Palmer
SK Atreya
T de Graauw
T Encrenaz
T Encrenaz
T Mukai
T Ootsubo
TH Prettyman
U Fink
U Fink
V Debout
VF Petrenko
VR Eke
VR Eke
WC Fraser
WF Bottke
WM Grundy
WM Grundy
Y Langevin
Y Shinnaka
YR Fernandez
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We review the evidence for buried ice in the asteroid belt; specifically the questions around the so-called Main Belt Comets (MBCs). We summarise the evidence for water throughout the Solar System, and describe the various methods for detecting it, including remote sensing from ultraviolet to radio wavelengths. We review progress in the first decade of study of MBCs, including observations, modelling of ice survival, and discussion on their origins. We then look at which methods will likely be most effective for further progress, including the key challenge of direct detection of (escaping) water in these bodies

arXiv.org e-Print Archive

HAL-INSU

OA@INAF - Istituto Nazionale di Astrofisica

Gazo bunseki to kanren joho o riyoshita gazo imi rikai ni kansuru kenkyu

Author: Sarin Supheakmungkol
Publication venue
Publication date: 01/01/2012
Field of study

制度:新 ; 報告番号:甲3514号 ; 学位の種類:博士(国際情報通信学) ; 授与年月日:2012/2/8 ; 早大学位記番号:新585