5 research outputs found
A standard TMF modeling for Arabic patents
International audiencePatent applications are similarly structured worldwide. They consist of a cover page, a speci cation, claims, drawings (if necessary) and an abstract. In addition to their content (text, numbers and citations), all patent publications contain a relatively rich set of well-de ned metadata. In the Arabic world, there is no North African or Arabian Intellectual Property O ce and therefore no uniform collections of Arabic patents. In Tunisia, for example, there is no digital collection of patent documents and therefore no XML collections. In this context, we aim to create a TMF standardized model for scienti c patents and develop a generator of XML patent collections having a uniform and easy to use structure. To test our approach, we will use a collection of XML scienti c patent documents in three languages (Arabic, French, and English)
Prior art retrieval using the claims section as a bag of words
Contains fulltext :
78907.pdf (author's version ) (Open Access)CLEF 2009 evaluation campaign, CLEF-IP workshop, 30 september 20093 p
Automating the search for a patent's prior art with a full text similarity search
More than ever, technical inventions are the symbol of our society's advance.
Patents guarantee their creators protection against infringement. For an
invention being patentable, its novelty and inventiveness have to be assessed.
Therefore, a search for published work that describes similar inventions to a
given patent application needs to be performed. Currently, this so-called
search for prior art is executed with semi-automatically composed keyword
queries, which is not only time consuming, but also prone to errors. In
particular, errors may systematically arise by the fact that different keywords
for the same technical concepts may exist across disciplines. In this paper, a
novel approach is proposed, where the full text of a given patent application
is compared to existing patents using machine learning and natural language
processing techniques to automatically detect inventions that are similar to
the one described in the submitted document. Various state-of-the-art
approaches for feature extraction and document comparison are evaluated. In
addition to that, the quality of the current search process is assessed based
on ratings of a domain expert. The evaluation results show that our automated
approach, besides accelerating the search process, also improves the search
results for prior art with respect to their quality
Retrieval for Extremely Long Queries and Documents with RPRS: a Highly Efficient and Effective Transformer-based Re-Ranker
Retrieval with extremely long queries and documents is a well-known and
challenging task in information retrieval and is commonly known as
Query-by-Document (QBD) retrieval. Specifically designed Transformer models
that can handle long input sequences have not shown high effectiveness in QBD
tasks in previous work. We propose a Re-Ranker based on the novel Proportional
Relevance Score (RPRS) to compute the relevance score between a query and the
top-k candidate documents. Our extensive evaluation shows RPRS obtains
significantly better results than the state-of-the-art models on five different
datasets. Furthermore, RPRS is highly efficient since all documents can be
pre-processed, embedded, and indexed before query time which gives our
re-ranker the advantage of having a complexity of O(N) where N is the total
number of sentences in the query and candidate documents. Furthermore, our
method solves the problem of the low-resource training in QBD retrieval tasks
as it does not need large amounts of training data, and has only three
parameters with a limited range that can be optimized with a grid search even
if a small amount of labeled data is available. Our detailed analysis shows
that RPRS benefits from covering the full length of candidate documents and
queries.Comment: Accepted at ACM Transactions on Information Systems (ACM TOIS
journal
Opportunity Identification for New Product Planning: Ontological Semantic Patent Classification
Intelligence tools have been developed and applied widely in many different areas in engineering, business and management. Many commercialized tools for business intelligence are available in the market. However, no practically useful tools for technology intelligence are available at this time, and very little academic research in technology intelligence methods has been conducted to date.
Patent databases are the most important data source for technology intelligence tools, but patents inherently contain unstructured data. Consequently, extracting text data from patent databases, converting that data to meaningful information and generating useful knowledge from this information become complex tasks. These tasks are currently being performed very ineffectively, inefficiently and unreliably by human experts. This deficiency is particularly vexing in product planning, where awareness of market needs and technological capabilities is critical for identifying opportunities for new products and services. Total nescience of the text of patents, as well as inadequate, unreliable and untimely knowledge derived from these patents, may consequently result in missed opportunities that could lead to severe competitive disadvantage and potentially catastrophic loss of revenue.
The research performed in this dissertation tries to correct the abovementioned deficiency with an approach called patent mining. The research is conducted at Finex, an iron casting company that produces traditional kitchen skillets. To \u27mine\u27 pertinent patents, experts in new product development at Finex modeled one ontology for the required product features and another for the attributes of requisite metallurgical enabling technologies from which new product opportunities for skillets are identified by applying natural language processing, information retrieval, and machine learning (classification) to the text of patents in the USPTO database.
Three main scenarios are examined in my research. Regular classification (RC) relies on keywords that are extracted directly from a group of USPTO patents. Ontological classification (OC) relies on keywords that result from an ontology developed by Finex experts, which is evaluated and improved by a panel of external experts. Ontological semantic classification (OSC) uses these ontological keywords and their synonyms, which are extracted from the WordNet database. For each scenario, I evaluate the performance of three classifiers: k-Nearest Neighbor (k-NN), random forest, and Support Vector Machine (SVM).
My research shows that OSC is the best scenario and SVM is the best classifier for identifying product planning opportunities, because this combination yields the highest score in metrics that are generally used to measure classification performance in machine learning (e.g., ROC-AUC and F-score). My method also significantly outperforms current practice, because I demonstrate in an experiment that neither the experts at Finex nor the panel of external experts are able to search for and judge relevant patents with any degree of effectiveness, efficiency or reliability.
This dissertation provides the rudiments of a theoretical foundation for patent mining, which has yielded a machine learning method that is deployed successfully in a new product planning setting (Finex). Further development of this method could make a significant contribution to management practice by identifying opportunities for new product development that have been missed by the approaches that have been deployed to date