268 research outputs found

    Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsPatent classification is one of the areas in Intellectual Property Analytics (IPA), and a growing use case since the number of patent applications has been increasing through the years worldwide. Patents are more than ever being used as financial protection for companies that also use patent databases to raise researches and leverage product innovations. Instituto Nacional de Propriedade Industrial, INPI, is the government agency responsible for protecting Industrial Property rights in Portugal. INPI has promoted a competition to explore technologies to solve some challenges related to Industrial Properties, including the classification of patents, one of the critical phases of the grant patent process. In this work project, we used the dataset put available by INPI to explore traditional machine learning algorithms to classify Portuguese patents and evaluate the performance of transfer learning methodologies to solve this task. BERTTimbau, a BERT architecture model pre-trained on a large Portuguese corpus, presented the best results to the task, even though with a performance only 4% superior to a LinearSVC model using TF-IDF feature engineering. In general, the model presents a good performance, despite the low score when classes had few training samples. However, the analysis of misclassified samples showed that the specificity of the context has more influence on the learning than the number of samples itself. Patent classification is a challenging task not just because of 1) the hierarchical structure of the classification but also because of 2) the way a patent is described, 3) the overlap of the contexts, and 4) the underrepresentation of the classes. Nevertheless, it is an area of growing interest, and that can be leveraged by the new researches that are revolutionizing machine learning applications, especially text mining

    Natural Language Processing in-and-for Design Research

    Full text link
    We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research

    ARIZ85 and patent-driven knowledge support

    Get PDF
    AbstractThe growing complexity of technical solutions, which encompass knowledge from different scientific fields, makes necessary, also for multi-disciplinary working teams, the consultation of information sources. Indeed, tacit knowledge is essential, but often not sufficient to achieve a proficient problem solving process. Besides, the most comprehensive tool of the TRIZ body of knowledge, i.e. ARIZ, requires, more or less explicitly, the retrieval of new knowledge in order to entirely exploit its potential to drive towards valuable solutions.A multitude of contributions from the literature support various common tasks encountered when using TRIZ and requiring additional information; most of them hold the objective of speeding up the generation of inventive solutions thanks to the capabilities of text mining techniques. Nevertheless, no global study has been conducted to fully disclose the effective knowledge requirements of ARIZ. With respect to this deficiency, the present paper illustrates an analysis of the algorithm with the specific objective of identifying the different types of information needs that can be satisfied by patents. The results of the investigation lay bare the most significant gaps of the research in the field. Further on, an initial proposal is advanced to structure the retrieval of relevant information from patent sources currently not supported by existing methodologies and software applications, so as to exploit the vast amount of technical knowledge contained in there. An illustrative experiment sheds light on the relevance of control parameters as input terms for the definition of search queries aimed at retrieving patents sharing the same physical contradiction of the problem to be treated

    ARIZ85 and Patent-driven Knowledge Support

    Get PDF

    Scientometric and Patentometric Analyses to Determine the Knowledge Landscape in Innovative Technologies: the Case of 3D Bioprinting

    Get PDF
    This research proposes an innovative data model to determine the landscape of emerging technologies. It is based on a competitive technology intelligence methodology that incorporates the assessment of scientific publications and patent analysis production, and is further supported by experts' feedback. It enables the definition of the growth rate of scientific and technological output in terms of the top countries, institutions and journals producing knowledge within the field as well as the identification of main areas of research and development by analyzing the International Patent Classification codes including keyword clusterization and co-occurrence of patent assignees and patent codes. This model was applied to the evolving domain of 3D bioprinting. Scientific documents from the Scopus and Web of Science data-bases, along with patents from 27 authorities and 140 countries, were retrieved. In total, 4782 scientific publications and 706 patents were identified from 2000 to mid-2016. The number of scientific documents published and patents in the last five years showed an annual average growth of 20% and 40%, respectively. Results indicate that the most prolific nations and institutions publishing on 3D bioprinting are the USA and China, including the Massachusetts Institute of Technology (USA), Nanyang Technological University (Singapore) and Tsinghua University (China), respectively. Biomaterials and Biofabrication are the predominant journals. The most prolific patenting countries are China and the USA; while Organovo Holdings Inc. (USA) and Tsinghua University (China) are the institutions leading. International Patent Classification codes reveal that most 3D bioprinting inventions intended for medical purposes apply porous or cellular materials or biologically active materials. Knowledge clusters and expert drivers indicate that there is a research focus on tissue engineering including the fabrication of organs, bioinks and new 3D bioprinting systems. Our model offers a guide to researchers to understand the knowledge production of pioneering technologies, in this case 3D bioprinting.This work was funded by Tecnologico de Monterrey through the Escuela de Ingenieria y Ciencias and also supported by a grant from the National Council for Science and Technology (CONACYT), Mexico (Grant number:261683).The funders had no role in study design, datacollection and analysis, decision to publish, or preparation of the manuscript

    Text mining-based patent analysis of BIM application in construction

    Get PDF
    As a data tool applicable to the full life-cycle of construction engineering and management, Building Information Modeling (BIM) has great potential for significantly increasing project productivity and performance. Awareness of BIM application hotspots and forecasting its trends can drive innovations in construction field. Using patents as data resources, this study develops an effective framework integrating the citation network analysis and the topic clustering technology to identify BIM application information and forecast its trends. This framework comprises three-step analysis:(1) quantitative characteristic analysis of patent outputs; (2) Social Network Analysis (SNA)-based co-occurrence network analysis; and (3) identification of BIM topics using a Latent Dirichlet Allocation (LDA). Finally, the case demonstrates the effectiveness of this framework contributing to promote technological development and innovation of BIM. The contributions of this study are threefold: (1) an innovative text mining-based framework for BIM patent analysis in construction is developed; (2) patents that have focused on identifying the application hotspots and development trend of BIM in accordance with our developed framework are reviewed; and (3) a signpost for technological development and innovation of BIM is provided

    The evolution of interindustry technology linkage topics and its analysis framework in 3D printing technology

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordThe mutual influence and complementarity of technologies between different industries are becoming increasingly prominent. Revealing the topic evolution of technology linkages between industries is the foundation for understanding the technological development trend of the industry. Although numerous works have focused on technology topic mining and its evolution characteristics, these works have not accurately represented the interindustry technology linkage, analyze the related topics and even ignored the technological development characteristics hidden in the topic evolution pathway. Since the Lingo algorithm fully considers the time-series characteristics of the topics, and the knowledge evolution theory can reveal three inherent characteristics in the evolution of knowledge topics, namely, “stability, heredity, and variability,” this article aims to combine the Lingo algorithm and the knowledge evolution theory to analyze the topic evolution of interindustry technology linkages. Additionally, because three-dimensional (3-D) printing technology has significant interdisciplinary and cross-industry characteristics, a wide range of application fields, and various interindustry technology linkages, 3-D printing technology is used for empirical analysis. The empirical results show that the key topics of interindustry technology linkages in 3-D printing include model design, manufacturing methods, manufacturing equipment, manufacturing material, and application. In addition, all these topics have the development feature of heredity. However, the topic of manufacturing materials presents significant variability, the topic of manufacturing methods has the strongest stability, and multiple subtopics of the five topics show variability and genetic intersection

    Opportunity Identification for New Product Planning: Ontological Semantic Patent Classification

    Get PDF
    Intelligence tools have been developed and applied widely in many different areas in engineering, business and management. Many commercialized tools for business intelligence are available in the market. However, no practically useful tools for technology intelligence are available at this time, and very little academic research in technology intelligence methods has been conducted to date. Patent databases are the most important data source for technology intelligence tools, but patents inherently contain unstructured data. Consequently, extracting text data from patent databases, converting that data to meaningful information and generating useful knowledge from this information become complex tasks. These tasks are currently being performed very ineffectively, inefficiently and unreliably by human experts. This deficiency is particularly vexing in product planning, where awareness of market needs and technological capabilities is critical for identifying opportunities for new products and services. Total nescience of the text of patents, as well as inadequate, unreliable and untimely knowledge derived from these patents, may consequently result in missed opportunities that could lead to severe competitive disadvantage and potentially catastrophic loss of revenue. The research performed in this dissertation tries to correct the abovementioned deficiency with an approach called patent mining. The research is conducted at Finex, an iron casting company that produces traditional kitchen skillets. To \u27mine\u27 pertinent patents, experts in new product development at Finex modeled one ontology for the required product features and another for the attributes of requisite metallurgical enabling technologies from which new product opportunities for skillets are identified by applying natural language processing, information retrieval, and machine learning (classification) to the text of patents in the USPTO database. Three main scenarios are examined in my research. Regular classification (RC) relies on keywords that are extracted directly from a group of USPTO patents. Ontological classification (OC) relies on keywords that result from an ontology developed by Finex experts, which is evaluated and improved by a panel of external experts. Ontological semantic classification (OSC) uses these ontological keywords and their synonyms, which are extracted from the WordNet database. For each scenario, I evaluate the performance of three classifiers: k-Nearest Neighbor (k-NN), random forest, and Support Vector Machine (SVM). My research shows that OSC is the best scenario and SVM is the best classifier for identifying product planning opportunities, because this combination yields the highest score in metrics that are generally used to measure classification performance in machine learning (e.g., ROC-AUC and F-score). My method also significantly outperforms current practice, because I demonstrate in an experiment that neither the experts at Finex nor the panel of external experts are able to search for and judge relevant patents with any degree of effectiveness, efficiency or reliability. This dissertation provides the rudiments of a theoretical foundation for patent mining, which has yielded a machine learning method that is deployed successfully in a new product planning setting (Finex). Further development of this method could make a significant contribution to management practice by identifying opportunities for new product development that have been missed by the approaches that have been deployed to date
    corecore