96,206 research outputs found

    Crowd-Sourcing Fuzzy and Faceted Classification for Concept Search

    Full text link
    Searching for concepts in science and technology is often a difficult task. To facilitate concept search, different types of human-generated metadata have been created to define the content of scientific and technical disclosures. Classification schemes such as the International Patent Classification (IPC) and MEDLINE's MeSH are structured and controlled, but require trained experts and central management to restrict ambiguity (Mork, 2013). While unstructured tags of folksonomies can be processed to produce a degree of structure (Kalendar, 2010; Karampinas, 2012; Sarasua, 2012; Bragg, 2013) the freedom enjoyed by the crowd typically results in less precision (Stock 2007). Existing classification schemes suffer from inflexibility and ambiguity. Since humans understand language, inference, implication, abstraction and hence concepts better than computers, we propose to harness the collective wisdom of the crowd. To do so, we propose a novel classification scheme that is sufficiently intuitive for the crowd to use, yet powerful enough to facilitate search by analogy, and flexible enough to deal with ambiguity. The system will enhance existing classification information. Linking up with the semantic web and computer intelligence, a Citizen Science effort (Good, 2013) would support innovation by improving the quality of granted patents, reducing duplicitous research, and stimulating problem-oriented solution design. A prototype of our design is in preparation. A crowd-sourced fuzzy and faceted classification scheme will allow for better concept search and improved access to prior art in science and technology

    Consistency and trends of technological innovations: a network approach to the international patent classification data

    Get PDF
    Classifying patents by the technology areas they pertain is important to enable information search and facilitate policy analysis and socio-economic studies. Based on the OECD Triadic Patent Family database, this study constructs a cohort network based on the grouping of IPC subclasses in the same patent families, and a citation network based on citations between subclasses of patent families citing each other. This paper presents a systematic analysis approach which obtains naturally formed network clusters identified using a Lumped Markov Chain method, extracts community keys traceable over time, and investigates two important community characteristics: consistency and changing trends. The results are verified against several other methods, including a recent research measuring patent text similarity. The proposed method contributes to the literature a network-based approach to study the endogenous community properties of an exogenously devised classification system. The application of this method may improve accuracy and efficiency of the IPC search platform and help detect the emergence of new technologies

    Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed

    Get PDF
    The patent domain is a very important source of scientific information that is currently not used to its full potential. Searching for relevant patents is a complex task because the number of existing patents is very high and grows quickly, patent text is extremely complicated, and standard vocabulary is not used consistently or doesn’t even exist. As a consequence, pure keyword searches often fail to return satisfying results in the patent domain. Major companies employ patent professionals who are able to search patents effectively, but even they have to invest a lot of time and effort into their search. Academic scientists on the other hand do not have access to such resources and therefore often do not search patents at all, but they risk missing up-to-date information that will not be published in scientific publications until much later, if it is published at all. Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Similarly, professional patent searches expand beyond keywords by including class codes from various patent classification systems. However, classification-based searches can only be performed effectively if the user has very detailed knowledge of the system, which is usually not the case for academic scientists. Consequently, we investigated methods to automatically identify relevant classes that can then be suggested to the user to expand their query. Since every patent is assigned at least one class code, it should be possible for these assignments to be used in a similar way as the MeSH annotations in PubMed. In order to develop a system for this task, it is necessary to have a good understanding of the properties of both classification systems. In order to gain such knowledge, we perform an in-depth comparative analysis of MeSH and the main patent classification system, the International Patent Classification (IPC). We investigate the hierarchical structures as well as the properties of the terms/classes respectively, and we compare the assignment of IPC codes to patents with the annotation of PubMed documents with MeSH terms. Our analysis shows that the hierarchies are structurally similar, but terms and annotations differ significantly. The most important differences concern the considerably higher complexity of the IPC class definitions compared to MeSH terms and the far lower number of class assignments to the average patent compared to the number of MeSH terms assigned to PubMed documents. As a result of these differences, problems are caused both for unexperienced patent searchers and professionals. On the one hand, the complex term system makes it very difficult for members of the former group to find any IPC classes that are relevant for their search task. On the other hand, the low number of IPC classes per patent points to incomplete class assignments by the patent office, therefore limiting the recall of the classification-based searches that are frequently performed by the latter group. We approach these problems from two directions: First, by automatically assigning additional patent classes to make up for the missing assignments, and second, by automatically retrieving relevant keywords and classes that are proposed to the user so they can expand their initial search. For the automated assignment of additional patent classes, we adapt an approach to the patent domain that was successfully used for the assignment of MeSH terms to PubMed abstracts. Each document is assigned a set of IPC classes by a large set of binary Maximum-Entropy classifiers. Our evaluation shows good performance by individual classifiers (precision/recall between 0:84 and 0:90), making the retrieval of additional relevant documents for specific IPC classes feasible. The assignment of additional classes to specific documents is more problematic, since the precision of our classifiers is not high enough to avoid false positives. However, we propose filtering methods that can help solve this problem. For the guided patent search, we demonstrate various methods to expand a user’s initial query. Our methods use both keywords and class codes that the user enters to retrieve additional relevant keywords and classes that are then suggested to the user. These additional query components are extracted from different sources such as patent text, IPC definitions, external vocabularies and co-occurrence data. The suggested expansions can help unexperienced users refine their queries with relevant IPC classes, and professionals can compose their complete query faster and more easily. We also present GoPatents, a patent retrieval prototype that incorporates some of our proposals and makes faceted browsing of a patent corpus possible

    Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection

    Get PDF
    Fingerprint recognition has been a hot research topic along the last few decades, with many applications and ever growing populations to identify. The need of flexible, fast identification systems is therefore patent in such situations. In this context, fingerprint classification is commonly used to improve the speed of the identification. This paper proposes a complete identification system with a hierarchical classification framework that fuses the information of multiple feature extractors. A feature selection is applied to improve the classification accuracy. Finally, the distributed identification is carried out with an incremental search, exploring the classes according to the probability order given by the classifier. A single parameter tunes the trade-off between identification time and accuracy. The proposal is evaluated over two NIST databases and a large synthetic database, yielding penetration rates close to the optimal values that can be reached with classification, leading to low identification times with small or no accuracy loss

    On Term Selection Techniques for Patent Prior Art Search

    No full text
    A patent is a set of exclusive rights granted to an inventor to protect his invention for a limited period of time. Patent prior art search involves finding previously granted patents, scientific articles, product descriptions, or any other published work that may be relevant to a new patent application. Many well-known information retrieval (IR) techniques (e.g., typical query expansion methods), which are proven effective for ad hoc search, are unsuccessful for patent prior art search. In this thesis, we mainly investigate the reasons that generic IR techniques are not effective for prior art search on the CLEF-IP test collection. First, we analyse the errors caused due to data curation and experimental settings like applying International Patent Classification codes assigned to the patent topics to filter the search results. Then, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection, starting with the description section of the reference patent and using language models (LM) and BM25 scoring functions. We find that an oracular relevance feedback system, which extracts terms from the judged relevant documents far outperforms the baseline (i.e., 0.11 vs. 0.48) and performs twice as well on mean average precision (MAP) as the best participant in CLEF-IP 2010 (i.e., 0.22 vs. 0.48). We find a very clear term selection value threshold for use when choosing terms. We also notice that most of the useful feedback terms are actually present in the original query and hypothesise that the baseline system can be substantially improved by removing negative query terms. We try four simple automated approaches to identify negative terms for query reduction but we are unable to improve on the baseline performance with any of them. However, we show that a simple, minimal feedback interactive approach, where terms are selected from only the first retrieved relevant document outperforms the best result from CLEF-IP 2010, suggesting the promise of interactive methods for term selection in patent prior art search

    Patent Landscape of Influenza A Virus Prophylactic Vaccines and Related Technologies

    Get PDF
    Executive Summary: This report focuses on patent landscape analysis of technologies related to prophylactic vaccines targeting pandemic strains of influenza. These technologies include methods of formulating vaccine, methods of producing of viruses or viral subunits, the composition of complete vaccines, and other technologies that have the potential to aid in a global response to this pathogen. The purpose of this patent landscape study was to search, identify, and categorize patent documents that are relevant to the development of vaccines that can efficiently promote the development of protective immunity against pandemic influenza virus strains. The search strategy used keywords which the team felt would be general enough to capture (or “recall”) the majority of patent documents which were directed toward vaccines against influenza A virus. After extensive searching of patent literature databases, approximately 33,500 publications were identified and collapsed to about 3,800 INPADOC families. Relevant documents, almost half of the total, were then identified and sorted into the major categories of vaccine compositions (about 570 families), technologies which support the development of vaccines (about 750 families), and general platform technologies that could be useful but are not specific to the problems presented by pandemic influenza strains (about 560 families). The first two categories, vaccines and supporting technologies, were further divided into particular subcategories to allow an interested reader to rapidly select documents relevant to the particular technology in which he or she is focused. This sorting process increased the precision of the result set. The two major categories (vaccines and supporting technologies) were subjected to a range of analytics in order to extract as much information as possible from the dataset. First, patent landscape maps were generated to assess the accuracy of the sorting procedure and to reveal the relationships between the various technologies that are involved in creating an effective vaccine. Then, filings trends are analyzed for the datasets. The country of origin for the technologies was determined, and the range of distribution to other jurisdictions was assessed. Filings were also analyzed by year, by assignee, and by inventor. Finally, the various patent classification systems were mapped to find which particular classes tend to hold influenza vaccine-related technologies. Besides the keywords developed during the searches and the landscape map generation, the classifications represent an alternate way for further researchers to identify emerging influenza technologies. The analysis included creation of a map of keywords, as shown above, describing the relationship of the various technologies involved in the development of prophylactic influenza A vaccines. The map has regions corresponding to live attenuated virus vaccines, subunit vaccines composed of split viruses or isolated viral polypeptides, and plasmids used in DNA vaccines. Important technologies listed on the map include the use of reverse genetics to create reassortant viruses, the growth of viruses in modified cell lines as opposed to the traditional methods using eggs, the production of recombinant viral antigens in various host cells, and the use of genetically-modified plants to produce virus-like particles. Another major finding was that the number of patent documents related to influenza being published has been steadily increasing in the last decade, as shown in the figure below. Until the mid-1990s, there were only a few influenza patent documents being published each year. The number of publications increased noticeably when TRIPS took effect, resulting in publication of patent applications. However, since 2006 the number of vaccine publications has exploded. In each of 2011 and 2012, about 100 references disclosing influenza vaccine technologies were published. Thus, interest in developing new and more efficacious influenza vaccines has been growing in recent years. This interest is probably being driven by recent influenza outbreaks, such as the H5N1 (bird flu) epidemic that began in the late 1990s and the 2009 H1N1 (swine flu) pandemic. The origins of the vaccine-related inventions were also analyzed. The team determined the country in which the priority application was filed, which was taken as an indication of the country where the invention was made or where the inventors intended to practice the invention. By far, most of the relevant families originated with patent applications filed in the United States. Other prominent priority countries were the China and United Kingdom, followed by Japan, Russia, and South Korea. France was a significant priority country only for supporting technologies, not for vaccines. Top assignees for these families were mostly large pharmaceutical companies, with the majority of patent families coming from Novartis, followed by GlaxoSmithKline, Pfizer, U.S. Merck (Merck, Sharpe, & Dohme), Sanofi, and AstraZeneca. Governmental and nonprofit institutes in China, Japan, Russia, South Korea and the United States also are contributing heavily to influenza vaccine research. Lastly, the jurisdictions were inventors have sought protection for their vaccine technologies were determined, and the number of patent families filing in a given country is plotted on the world map shown on page seven. The United States, Canada, Australia, Japan, South Korea and China have the highest level of filings, followed by Germany, Brazil, India, Mexico and New Zealand. However, although there are a significant number of filings in Brazil, the remainder of Central and South America has only sparse filings. Of concern, with the exception of South Africa, few other African nations have a significant number of filings. In summary, the goal of this report is to provide a knowledge resource for making informed policy decisions and for creating strategic plans concerning the assembly of efficacious vaccines against a rapidly-spreading, highly virulent influenza strain. The team has defined the current state of the art of technologies involved in the manufacture of influenza vaccines, and the important assignees, inventors, and countries have been identified. This document should reveal both the strengths and weaknesses of the current level of preparedness for responding to an emerging pandemic influenza strain. The effects of H5N1 and H1N1 epidemics have been felt across the globe in the last decade, and future epidemics are very probable in the near future, so preparations are necessary to meet this global health threat

    Multiple Retrieval Models and Regression Models for Prior Art Search

    Get PDF
    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend
    • …
    corecore