26,040 research outputs found

    Simple vs. sophisticated approaches for patent prior-art search

    Get PDF
    Patent prior-art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-art in patent prior-art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided

    Multiple Retrieval Models and Regression Models for Prior Art Search

    Get PDF
    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend

    PLuTO: MT for online patent translation

    Get PDF
    PLuTO ā€“ Patent Language Translation Online ā€“ is a partially EU-funded commercialization project which specializes in the automatic retrieval and translation of patent documents. At the core of the PLuTO framework is a machine translation (MT) engine through which web-based translation services are offered. The fully integrated PLuTO architecture includes a translation engine coupling MT with translation memories (TM), and a patent search and retrieval engine. In this paper, we first describe the motivating factors behind the provision of such a service. Following this, we give an overview of the PLuTO framework as a whole, with particular emphasis on the MT components, and provide a real world use case scenario in which PLuTO MT services are exploited

    United we fall, divided we stand: A study of query segmentation and PRF for patent prior art search

    Get PDF
    Previous research in patent search has shown that reducing queries by extracting a few key terms is ineffective primarily because of the vocabulary mismatch between patent applications used as queries and existing patent documents. This ļ¬nding has led to the use of full patent applications as queries in patent prior art search. In addition, standard information retrieval (IR) techniques such as query expansion (QE) do not work effectively with patent queries, principally because of the presence of noise terms in the massive queries. In this study, we take a new approach to QE for patent search. Text segmentation is used to decompose a patent query into selfcoherent sub-topic blocks. Each of these much shorted sub-topic blocks which is representative of a speciļ¬c aspect or facet of the invention, is then used as a query to retrieve documents. Documents retrieved using the different resulting sub-queries or query streams are interleaved to construct a ļ¬nal ranked list. This technique can exploit the potential beneļ¬t of QE since the segmented queries are generally more focused and less ambiguous than the full patent query. Experiments on the CLEF-2010 IP prior-art search task show that the proposed method outperforms the retrieval effectiveness achieved when using a single full patent application text as the query, and also demonstrates the potential beneļ¬ts of QE to alleviate the vocabulary mismatch problem in patent search

    Information Access in a Multilingual World: Transitioning from Research to Real-World Applications

    Get PDF
    Multilingual Information Access (MLIA) is at a turning point wherein substantial real-world applications are being introduced after fifteen years of research into cross-language information retrieval, question answering, statistical machine translation and named entity recognition. Previous workshops on this topic have focused on research and small- scale applications. The focus of this workshop was on technology transfer from research to applications and on what future research needs to be done which facilitates MLIA in an increasingly connected multilingual world

    GRISP: A Massive Multilingual Terminological Database for Scientiļ¬c and Technical Domains

    Get PDF
    International audienceThe development of a multilingual terminology is a very long and costly process. We present the creation of a multilingual terminological database called GRISP covering multiple technical and scientiļ¬c ļ¬elds from various open resources. A crucial aspect is the merging of the different resources which is based in our proposal on the deļ¬nition of a sound conceptual model, different domain mapping and the use of structural constraints and machine learning techniques for controlling the fusion process. The result is a massive terminological database of several millions terms, concepts, semantic relations and deļ¬nitions. This resource has allowed us to improve signiļ¬cantly the mean average precision of an information retrieval system applied to a large collection of multilingual and multidomain patent documents
    • ā€¦
    corecore