12 research outputs found

    PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT

    Full text link
    This study provides an efficient approach for using text data to calculate patent-to-patent (p2p) technological similarity, and presents a hybrid framework for leveraging the resulting p2p similarity for applications such as semantic search and automated patent classification. We create embeddings using Sentence-BERT (SBERT) based on patent claims. We leverage SBERTs efficiency in creating embedding distance measures to map p2p similarity in large sets of patent data. We deploy our framework for classification with a simple Nearest Neighbors (KNN) model that predicts Cooperative Patent Classification (CPC) of a patent based on the class assignment of the K patents with the highest p2p similarity. We thereby validate that the p2p similarity captures their technological features in terms of CPC overlap, and at the same demonstrate the usefulness of this approach for automatic patent classification based on text data. Furthermore, the presented classification framework is simple and the results easy to interpret and evaluate by end-users. In the out-of-sample model validation, we are able to perform a multi-label prediction of all assigned CPC classes on the subclass (663) level on 1,492,294 patents with an accuracy of 54% and F1 score > 66%, which suggests that our model outperforms the current state-of-the-art in text-based multi-label and multi-class patent classification. We furthermore discuss the applicability of the presented framework for semantic IP search, patent landscaping, and technology intelligence. We finally point towards a future research agenda for leveraging multi-source patent embeddings, their appropriateness across applications, as well as to improve and validate patent embeddings by creating domain-expert curated Semantic Textual Similarity (STS) benchmark datasets.Comment: 18 pages, 7 figures and 4 Table

    Patent Classification Using Ontology-Based Patent Network Analysis

    Get PDF
    Patent management is increasingly important for organizations to sustain their competitive advantage. The classification of patents is essential for patent management and industrial analysis. In this study, we propose a novel patent network-based classification method to analyze query patents and predict their classes. The proposed patent network, which contains various types of nodes that represent different features extracted from patent documents, is constructed based on the relationship metrics derived from patent metadata. The novel approach analyzes reachable nodes in the patent ontology network to calculate their relevance to query patents, after which it uses the modified k-nearest neighbor classifier to classify query patents. We evaluate the performance of the proposed approach on a test dataset of patent documents obtained from the United States Patent and Trademark Office (USPTO), and compare it with the performance of the three conventional methods. The results demonstrate that the proposed patent network-based approach outperforms the conventional approaches

    Industrial challenges in patent management and crowdsourcing patent landscapes for engineering design innovation

    Get PDF
    Innovation is critical to sustain in prevailing competitive business environments. Industries need effective innovation strategies in-practice to develop and deliver novel products and services swiftly. In order to implement innovation strategies effectively, industries need innovation capacity in engineering design supported with intellectual assets. However, there are many issues that prevent streamlining these processes. The objectives of this research are to explicit the issues related to industrial patents (one of the important resources in intellectual assets) generation and management processes, and propose cost-effective crowdsourcing approach as a tool for patent landscaping activities. Interviews with patent attorneys and intellectual audit specialists reveal that most industries have ineffective intellectual property strategy; engineers do little patent searching, face challenges to identify novel product features, and often find difficulties to interpret patent information. The initial experiments of using the crowdsourcing approach for patent clustering activity reveal that general crowd workers (not knowing much about patents) were able to identify one third of expert clustered schema for much lesser cost. Further research work to strengthen the usefulness of the crowdsourcing approach for patent landscaping related activities is discussed

    Feature Selection for Text and Image Data Using Differential Evolution with SVM and Naïve Bayes Classifiers

    Get PDF
    Classification problems are increasing in various important applications such as text categorization, images, medical imaging diagnosis and bimolecular analysis etc. due to large amount of attribute set. Feature extraction methods in case of large dataset play an important role to reduce the irrelevant feature and thereby increases the performance of classifier algorithm. There exist various methods based on machine learning for text and image classification. These approaches are utilized for dimensionality reduction which aims to filter less informative and outlier data. Therefore, these approaches provide compact representation and computationally better tractable accuracy. At the same time, these methods can be challenging if the search space is doubled multiple time. To optimize such challenges, a hybrid approach is suggested in this paper. The proposed approach uses differential evolution (DE) for feature selection with naïve bayes (NB) and support vector machine (SVM) classifiers to enhance the performance of selected classifier. The results are verified using text and image data which reflects improved accuracy compared with other conventional techniques. A 25 benchmark datasets (UCI) from different domains are considered to test the proposed algorithms.  A comparative study between proposed hybrid classification algorithms are presented in this work. Finally, the experimental result shows that the differential evolution with NB classifier outperforms and produces better estimation of probability terms. The proposed technique in terms of computational time is also feasible

    Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsPatent classification is one of the areas in Intellectual Property Analytics (IPA), and a growing use case since the number of patent applications has been increasing through the years worldwide. Patents are more than ever being used as financial protection for companies that also use patent databases to raise researches and leverage product innovations. Instituto Nacional de Propriedade Industrial, INPI, is the government agency responsible for protecting Industrial Property rights in Portugal. INPI has promoted a competition to explore technologies to solve some challenges related to Industrial Properties, including the classification of patents, one of the critical phases of the grant patent process. In this work project, we used the dataset put available by INPI to explore traditional machine learning algorithms to classify Portuguese patents and evaluate the performance of transfer learning methodologies to solve this task. BERTTimbau, a BERT architecture model pre-trained on a large Portuguese corpus, presented the best results to the task, even though with a performance only 4% superior to a LinearSVC model using TF-IDF feature engineering. In general, the model presents a good performance, despite the low score when classes had few training samples. However, the analysis of misclassified samples showed that the specificity of the context has more influence on the learning than the number of samples itself. Patent classification is a challenging task not just because of 1) the hierarchical structure of the classification but also because of 2) the way a patent is described, 3) the overlap of the contexts, and 4) the underrepresentation of the classes. Nevertheless, it is an area of growing interest, and that can be leveraged by the new researches that are revolutionizing machine learning applications, especially text mining

    Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed

    Full text link

    A deep learning framework for contingent liabilities risk management : predicting Brazilian labor court decisions

    Get PDF
    Estimar o resultado de um processo em litígio é crucial para muitas organizações. Uma aplicação específica são os "Passivos Contingenciais", que se referem a passivos que podem ou não ocorrer dependendo do resultado de um processo judicial em litígio. A metodologia tradicional para estimar essa probabilidade baseia-se na opinião de um advogado quem determina a possibilidade de um processo judicial ser perdido a partir de uma avaliação quantitativa. Esta tese apresenta a um modelo matemático baseado numa arquitetura de Deep Learning cujo objetivo é estimar a probabilidade de ganho ou perda de um processo de litígio, principalmente para ser utilizada na estimação de Passivos Contingenciais. A arquitetura, diferentemente do método tradicional, oferece um maior grau de confiança ao prever o resultado de um processo legal em termos de probabilidade e com um tempo de processamento de segundos. Além do resultado primário, a arquitetura estima uma amostra dos casos mais semelhantes ao processo estimado, que servem de apoio para a realização de estratégias de litígio. Nossa arquitetura foi testada em duas bases de dados de processos legais: (1) o Tribunal Europeu de Direitos Humanos (ECHR) e (2) o 4º Tribunal Regional do Trabalho brasileiro (4TRT). Ela estimou de acordo com nosso conhecimento, o melhor desempenho já publicado (precisão = 0,906) na base de dados da ECHR, uma coleção amplamente utilizada de processos legais, e é o primeiro trabalho a aplicar essa metodologia em um tribunal de trabalho brasileiro. Os resultados mostram que a arquitetura é uma alternativa adequada a ser utilizada contra o método tradicional de estimação do desfecho de um processo em litígio realizado por advogados. Finalmente, validamos nossos resultados com especialistas que confirmaram as possibilidades promissoras da arquitetura. Assim, nos incentivamos os académicos a continuar desenvolvendo pesquisas sobre modelagem matemática na área jurídica, pois é um tema emergente com um futuro promissor e aos usuários a utilizar ferramentas baseadas como a desenvolvida em nosso trabalho, pois fornecem vantagens substanciais em termos de precisão e velocidade sobre os métodos convencionais.Estimating the likely outcome of a litigation process is crucial for many organizations. A specific application is the “Contingents Liabilities,” which refers to liabilities that may or may not occur depending on the result of a pending litigation process (lawsuit). The traditional methodology for estimating this likelihood is based on the opinion from the lawyer’s experience which is based on a qualitative appreciation. This dissertation presents a mathematical modeling framework based on a Deep Learning architecture that estimates the probability outcome of a litigation process (accepted & not accepted) with a particular use on Contingent Liabilities. The framework offers a degree of confidence by describing how likely an event will occur in terms of probability and provides results in seconds. Besides the primary outcome, it offers a sample of the most similar cases to the estimated lawsuit that serve as support to perform litigation strategies. We tested our framework in two litigation process databases from: (1) the European Court of Human Rights (ECHR) and (2) the Brazilian 4th regional labor court. Our framework achieved to our knowledge the best-published performance (precision = 0.906) on the ECHR database, a widely used collection of litigation processes, and it is the first to be applied in a Brazilian labor court. Results show that the framework is a suitable alternative to be used against the traditional method of estimating the verdict outcome from a pending litigation performed by lawyers. Finally, we validated our results with experts who confirmed the promising possibilities of the framework. We encourage academics to continue developing research on mathematical modeling in the legal area as it is an emerging topic with a promising future and practitioners to use tools based as the proposed, as they provides substantial advantages in terms of accuracy and speed over conventional methods
    corecore