5 research outputs found

    A standard TMF modeling for Arabic patents

    Get PDF
    International audiencePatent applications are similarly structured worldwide. They consist of a cover page, a speci cation, claims, drawings (if necessary) and an abstract. In addition to their content (text, numbers and citations), all patent publications contain a relatively rich set of well-de ned metadata. In the Arabic world, there is no North African or Arabian Intellectual Property O ce and therefore no uniform collections of Arabic patents. In Tunisia, for example, there is no digital collection of patent documents and therefore no XML collections. In this context, we aim to create a TMF standardized model for scienti c patents and develop a generator of XML patent collections having a uniform and easy to use structure. To test our approach, we will use a collection of XML scienti c patent documents in three languages (Arabic, French, and English)

    Prior art retrieval using the claims section as a bag of words

    Get PDF
    Contains fulltext : 78907.pdf (author's version ) (Open Access)CLEF 2009 evaluation campaign, CLEF-IP workshop, 30 september 20093 p

    Automating the search for a patent's prior art with a full text similarity search

    Full text link
    More than ever, technical inventions are the symbol of our society's advance. Patents guarantee their creators protection against infringement. For an invention being patentable, its novelty and inventiveness have to be assessed. Therefore, a search for published work that describes similar inventions to a given patent application needs to be performed. Currently, this so-called search for prior art is executed with semi-automatically composed keyword queries, which is not only time consuming, but also prone to errors. In particular, errors may systematically arise by the fact that different keywords for the same technical concepts may exist across disciplines. In this paper, a novel approach is proposed, where the full text of a given patent application is compared to existing patents using machine learning and natural language processing techniques to automatically detect inventions that are similar to the one described in the submitted document. Various state-of-the-art approaches for feature extraction and document comparison are evaluated. In addition to that, the quality of the current search process is assessed based on ratings of a domain expert. The evaluation results show that our automated approach, besides accelerating the search process, also improves the search results for prior art with respect to their quality

    Retrieval for Extremely Long Queries and Documents with RPRS: a Highly Efficient and Effective Transformer-based Re-Ranker

    Full text link
    Retrieval with extremely long queries and documents is a well-known and challenging task in information retrieval and is commonly known as Query-by-Document (QBD) retrieval. Specifically designed Transformer models that can handle long input sequences have not shown high effectiveness in QBD tasks in previous work. We propose a Re-Ranker based on the novel Proportional Relevance Score (RPRS) to compute the relevance score between a query and the top-k candidate documents. Our extensive evaluation shows RPRS obtains significantly better results than the state-of-the-art models on five different datasets. Furthermore, RPRS is highly efficient since all documents can be pre-processed, embedded, and indexed before query time which gives our re-ranker the advantage of having a complexity of O(N) where N is the total number of sentences in the query and candidate documents. Furthermore, our method solves the problem of the low-resource training in QBD retrieval tasks as it does not need large amounts of training data, and has only three parameters with a limited range that can be optimized with a grid search even if a small amount of labeled data is available. Our detailed analysis shows that RPRS benefits from covering the full length of candidate documents and queries.Comment: Accepted at ACM Transactions on Information Systems (ACM TOIS journal

    Opportunity Identification for New Product Planning: Ontological Semantic Patent Classification

    Get PDF
    Intelligence tools have been developed and applied widely in many different areas in engineering, business and management. Many commercialized tools for business intelligence are available in the market. However, no practically useful tools for technology intelligence are available at this time, and very little academic research in technology intelligence methods has been conducted to date. Patent databases are the most important data source for technology intelligence tools, but patents inherently contain unstructured data. Consequently, extracting text data from patent databases, converting that data to meaningful information and generating useful knowledge from this information become complex tasks. These tasks are currently being performed very ineffectively, inefficiently and unreliably by human experts. This deficiency is particularly vexing in product planning, where awareness of market needs and technological capabilities is critical for identifying opportunities for new products and services. Total nescience of the text of patents, as well as inadequate, unreliable and untimely knowledge derived from these patents, may consequently result in missed opportunities that could lead to severe competitive disadvantage and potentially catastrophic loss of revenue. The research performed in this dissertation tries to correct the abovementioned deficiency with an approach called patent mining. The research is conducted at Finex, an iron casting company that produces traditional kitchen skillets. To \u27mine\u27 pertinent patents, experts in new product development at Finex modeled one ontology for the required product features and another for the attributes of requisite metallurgical enabling technologies from which new product opportunities for skillets are identified by applying natural language processing, information retrieval, and machine learning (classification) to the text of patents in the USPTO database. Three main scenarios are examined in my research. Regular classification (RC) relies on keywords that are extracted directly from a group of USPTO patents. Ontological classification (OC) relies on keywords that result from an ontology developed by Finex experts, which is evaluated and improved by a panel of external experts. Ontological semantic classification (OSC) uses these ontological keywords and their synonyms, which are extracted from the WordNet database. For each scenario, I evaluate the performance of three classifiers: k-Nearest Neighbor (k-NN), random forest, and Support Vector Machine (SVM). My research shows that OSC is the best scenario and SVM is the best classifier for identifying product planning opportunities, because this combination yields the highest score in metrics that are generally used to measure classification performance in machine learning (e.g., ROC-AUC and F-score). My method also significantly outperforms current practice, because I demonstrate in an experiment that neither the experts at Finex nor the panel of external experts are able to search for and judge relevant patents with any degree of effectiveness, efficiency or reliability. This dissertation provides the rudiments of a theoretical foundation for patent mining, which has yielded a machine learning method that is deployed successfully in a new product planning setting (Finex). Further development of this method could make a significant contribution to management practice by identifying opportunities for new product development that have been missed by the approaches that have been deployed to date
    corecore