4,146 research outputs found

    Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsPatent classification is one of the areas in Intellectual Property Analytics (IPA), and a growing use case since the number of patent applications has been increasing through the years worldwide. Patents are more than ever being used as financial protection for companies that also use patent databases to raise researches and leverage product innovations. Instituto Nacional de Propriedade Industrial, INPI, is the government agency responsible for protecting Industrial Property rights in Portugal. INPI has promoted a competition to explore technologies to solve some challenges related to Industrial Properties, including the classification of patents, one of the critical phases of the grant patent process. In this work project, we used the dataset put available by INPI to explore traditional machine learning algorithms to classify Portuguese patents and evaluate the performance of transfer learning methodologies to solve this task. BERTTimbau, a BERT architecture model pre-trained on a large Portuguese corpus, presented the best results to the task, even though with a performance only 4% superior to a LinearSVC model using TF-IDF feature engineering. In general, the model presents a good performance, despite the low score when classes had few training samples. However, the analysis of misclassified samples showed that the specificity of the context has more influence on the learning than the number of samples itself. Patent classification is a challenging task not just because of 1) the hierarchical structure of the classification but also because of 2) the way a patent is described, 3) the overlap of the contexts, and 4) the underrepresentation of the classes. Nevertheless, it is an area of growing interest, and that can be leveraged by the new researches that are revolutionizing machine learning applications, especially text mining

    Patent Data for Comparative Study: Case study of Top Aspirants in Bioinformatics Industry

    Get PDF
    Innovation and technology are considered as a subject of success and achievement to the firm. The comparative study represents an essential procedure to identify the innovation and technological capabilities of major players involved in the bioinformatics related inventions. The aim of the research is to map out the top firms and identify their strategically important technologies. In view of this, the comparative analysis of major firms in bioinformatics industry is carried out using patent information. Herein the top three assignees are considered and based on this further analysis is performed. The top companies’ trend suggests that Thermo Fisher Scientific Inc. is the major player in bioinformatics research. Thus, we have tried to develop an overview on their patenting trend and important concerned areas of research. Also, our results indicate that the application of computational tools is being utilized for most of the research areas like the study of genomics and proteomics, sequence categorization and their structural prediction

    Patient dossier: healthcare queries over distributed resources

    Get PDF
    As with many other aspects of the modern world, in healthcare, the explosion of data and resources opens new opportunities for the development of added-value services. Still, a number of specific conditions on this domain greatly hinders these developments, including ethical and legal issues, fragmentation of the relevant data in different locations, and a level of (meta)data complexity that requires great expertise across technical, clinical, and biological domains. We propose the Patient Dossier paradigm as a way to organize new innovative healthcare services that sorts the current limitations. The Patient Dossier conceptual framework identifies the different issues and suggests how they can be tackled in a safe, efficient, and responsible way while opening options for independent development for different players in the healthcare sector. An initial implementation of the Patient Dossier concepts in the Rbbt framework is available as open-source at https://github.com/mikisvaz and https://github.com/Rbbt-Workflows.This work has received funding from the Elixir-Excelerate project, from the European Union's Horizon 2020 Research and Innovation Programme, under grant agreement N. 676559, and from Plataforma de Recursos Biomoleculares y BioinformĂĄticos PT13/0001/0030. Additional support came from the Lenovo - BSC Master Collaboration Agreement (2015) and from the IBM-BSC Deep Learning Centre (2016). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer ReviewedPostprint (published version

    Patient dossier: healthcare queries over distributed resources

    Get PDF
    As with many other aspects of the modern world, in healthcare, the explosion of data and resources opens new opportunities for the development of added-value services. Still, a number of specific conditions on this domain greatly hinders these developments, including ethical and legal issues, fragmentation of the relevant data in different locations, and a level of (meta)data complexity that requires great expertise across technical, clinical, and biological domains. We propose the Patient Dossier paradigm as a way to organize new innovative healthcare services that sorts the current limitations. The Patient Dossier conceptual framework identifies the different issues and suggests how they can be tackled in a safe, efficient, and responsible way while opening options for independent development for different players in the healthcare sector. An initial implementation of the Patient Dossier concepts in the Rbbt framework is available as open-source at https://github.com/mikisvaz and https://github.com/Rbbt-Workflows.This work has received funding from the Elixir-Excelerate project, from the European Union's Horizon 2020 Research and Innovation Programme, under grant agreement N. 676559, and from Plataforma de Recursos Biomoleculares y BioinformĂĄticos PT13/0001/0030. Additional support came from the Lenovo - BSC Master Collaboration Agreement (2015) and from the IBM-BSC Deep Learning Centre (2016). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer ReviewedPostprint (published version

    Will Building ‘Good Fences’ Really Make ‘Good Neighbors’ in Science?

    Get PDF
    Problematic issues are raised by the expressed intention of the European Commission to promote greater awareness on the part of scientists in the “European Research Area” about intellectual property rights and their uses in the context of “Internet intensive research collaborations.” Promoting greater awareness and encouraging more systematic usage of IRP protections are logically distinct, but as policies for implementation – especially within the EC’s Fifth Framework Programme – the former can too readily shade into the latter. Building “good fences” does not make for “good (more productive) neighbors” in science. Balance needs to be maintained between the “open science” mode of research, and private proprietary R&D, because at the macro-system level the functions that each is well-suited to serve are complementary. Recent policy initiatives, particularly by the EC in relation to the legal protection of property rights in database, pose a serious threat to the utility of collaboratively consttructed digital information infrastructures that provide “information spaces” for voyages of scientific discovery. The case for alternative policy approaches is argued in this paper, and several specific proposals are set out for further discussion.

    Opportunities in biotechnology

    Get PDF
    • 

    corecore