Search CORE

26,040 research outputs found

Simple vs. sophisticated approaches for patent prior-art search

Author: Jones Gareth J.F.
Lopez Patrice
Magdy Walid
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/04/2011
Field of study

Patent prior-art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-art in patent prior-art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided

DCU Online Research Access Service

Multiple Retrieval Models and Regression Models for Prior Art Search

Author: Lopez Patrice
Romary Laurent
Publication venue
Publication date: 01/01/2009
Field of study

This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

PLuTO: MT for online patent translation

Author: Sheridan Páraic
Tinsley John
Way Andy
Publication venue: Association for Machine Translation in the Americas
Publication date: 01/01/2010
Field of study

PLuTO – Patent Language Translation Online – is a partially EU-funded commercialization project which specializes in the automatic retrieval and translation of patent documents. At the core of the PLuTO framework is a machine translation (MT) engine through which web-based translation services are offered. The fully integrated PLuTO architecture includes a translation engine coupling MT with translation memories (TM), and a patent search and retrieval engine. In this paper, we first describe the motivating factors behind the provision of such a service. Following this, we give an overview of the PLuTO framework as a whole, with particular emphasis on the MT components, and provide a real world use case scenario in which PLuTO MT services are exploited

CiteSeerX

Irish Universities

DCU Online Research Access Service

United we fall, divided we stand: A study of query segmentation and PRF for patent prior art search

Author: Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Publication venue
Publication date: 01/10/2011
Field of study

Previous research in patent search has shown that reducing queries by extracting a few key terms is ineffective primarily because of the vocabulary mismatch between patent applications used as queries and existing patent documents. This ﬁnding has led to the use of full patent applications as queries in patent prior art search. In addition, standard information retrieval (IR) techniques such as query expansion (QE) do not work effectively with patent queries, principally because of the presence of noise terms in the massive queries. In this study, we take a new approach to QE for patent search. Text segmentation is used to decompose a patent query into selfcoherent sub-topic blocks. Each of these much shorted sub-topic blocks which is representative of a speciﬁc aspect or facet of the invention, is then used as a query to retrieve documents. Documents retrieved using the different resulting sub-queries or query streams are interleaved to construct a ﬁnal ranked list. This technique can exploit the potential beneﬁt of QE since the segmented queries are generally more focused and less ambiguous than the full patent query. Experiments on the CLEF-2010 IP prior-art search task show that the proposed method outperforms the retrieval effectiveness achieved when using a single full patent application text as the query, and also demonstrates the potential beneﬁts of QE to alleviate the vocabulary mismatch problem in patent search

CiteSeerX

Irish Universities

DCU Online Research Access Service

Information Access in a Multilingual World: Transitioning from Research to Real-World Applications

Author: Gey Fredric
Kando Noriko
Karlgren Jussi
Publication venue: Association of Computing Machinery
Publication date: 01/01/2009
Field of study

Multilingual Information Access (MLIA) is at a turning point wherein substantial real-world applications are being introduced after fifteen years of research into cross-language information retrieval, question answering, statistical machine translation and named entity recognition. Previous workshops on this topic have focused on research and small- scale applications. The focus of this workshop was on technology transfer from research to applications and on what future research needs to be done which facilitates MLIA in an increasingly connected multilingual world

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

GRISP: A Massive Multilingual Terminological Database for Scientiﬁc and Technical Domains

Author: Lopez Patrice
Romary Laurent
Publication venue: HAL CCSD
Publication date: 19/05/2010
Field of study

International audienceThe development of a multilingual terminology is a very long and costly process. We present the creation of a multilingual terminological database called GRISP covering multiple technical and scientiﬁc ﬁelds from various open resources. A crucial aspect is the merging of the different resources which is based in our proposal on the deﬁnition of a sound conceptual model, different domain mapping and the use of structural constraints and machine learning techniques for controlling the fusion process. The result is a massive terminological database of several millions terms, concepts, semantic relations and deﬁnitions. This resource has allowed us to improve signiﬁcantly the mean average precision of an information retrieval system applied to a large collection of multilingual and multidomain patent documents

INRIA a CCSD electronic archive server