13,674 research outputs found

    Natural Language Processing for Technology Foresight Summarization and Simplification: the case of patents

    Get PDF
    Technology foresight aims to anticipate possible developments, understand trends, and identify technologies of high impact. To this end, monitoring emerging technologies is crucial. Patents -- the legal documents that protect novel inventions -- can be a valuable source for technology monitoring. Millions of patent applications are filed yearly, with 3.4 million applications in 2021 only. Patent documents are primarily textual documents and disclose innovative and potentially valuable inventions. However, their processing is currently underresearched. This is due to several reasons, including the high document complexity: patents are very lengthy and are written in an extremely hard-to-read language, which is a mix of technical and legal jargon. This thesis explores how Natural Language Processing -- the discipline that enables machines to process human language automatically -- can aid patent processing. Specifically, we focus on two tasks: patent summarization (i.e., we try to reduce the document length while preserving its core content) and patent simplification (i.e., we try to reduce the document's linguistic complexity while preserving its original core meaning). We found that older patent summarization approaches were not compared on shared benchmarks (making thus it hard to draw conclusions), and even the most recent abstractive dataset presents important issues that might make comparisons meaningless. We try to fill both gaps: we first document the issues related to the BigPatent dataset and then benchmark extractive, abstraction, and hybrid approaches in the patent domain. We also explore transferring summarization methods from the scientific paper domain with limited success. For the automatic text simplification task, we noticed a lack of simplified text and parallel corpora. We fill this gap by defining a method to generate a silver standard for patent simplification automatically. Lay human judges evaluated the simplified sentences in the corpus as grammatical, adequate, and simpler, and we show that it can be used to train a state-of-the-art simplification model. This thesis describes the first steps toward Natural Language Processing-aided patent summarization and simplification. We hope it will encourage more research on the topic, opening doors for a productive dialog between NLP researchers and domain experts.Technology foresight aims to anticipate possible developments, understand trends, and identify technologies of high impact. To this end, monitoring emerging technologies is crucial. Patents -- the legal documents that protect novel inventions -- can be a valuable source for technology monitoring. Millions of patent applications are filed yearly, with 3.4 million applications in 2021 only. Patent documents are primarily textual documents and disclose innovative and potentially valuable inventions. However, their processing is currently underresearched. This is due to several reasons, including the high document complexity: patents are very lengthy and are written in an extremely hard-to-read language, which is a mix of technical and legal jargon. This thesis explores how Natural Language Processing -- the discipline that enables machines to process human language automatically -- can aid patent processing. Specifically, we focus on two tasks: patent summarization (i.e., we try to reduce the document length while preserving its core content) and patent simplification (i.e., we try to reduce the document's linguistic complexity while preserving its original core meaning). We found that older patent summarization approaches were not compared on shared benchmarks (making thus it hard to draw conclusions), and even the most recent abstractive dataset presents important issues that might make comparisons meaningless. We try to fill both gaps: we first document the issues related to the BigPatent dataset and then benchmark extractive, abstraction, and hybrid approaches in the patent domain. We also explore transferring summarization methods from the scientific paper domain with limited success. For the automatic text simplification task, we noticed a lack of simplified text and parallel corpora. We fill this gap by defining a method to generate a silver standard for patent simplification automatically. Lay human judges evaluated the simplified sentences in the corpus as grammatical, adequate, and simpler, and we show that it can be used to train a state-of-the-art simplification model. This thesis describes the first steps toward Natural Language Processing-aided patent summarization and simplification. We hope it will encourage more research on the topic, opening doors for a productive dialog between NLP researchers and domain experts

    Applying the KISS principle for the CLEF-IP 2010 prior art candidate patent search task

    Get PDF
    We present our experiments and results for the DCU CNGL participation in the CLEF-IP 2010 Candidate Patent Search Task. Our work applied standard information retrieval (IR) techniques to patent search. In addition, a very simple citation extraction method was applied to improve the results. This was our second consecutive participation in the CLEF-IP tasks. Our experiments in 2009 showed that many sophisticated approach to IR do not improve the retrieval effectiveness for this task. For this reason of we decided to apply only simple methods in 2010. These were demonstrated to be highly competitive with other participants. DCU submitted three runs for the Prior Art Candidate Search Task, two of these runs achieved the second and third ranks among the 25 runs submitted by nine different participants. Our best run achieved MAP of 0.203, recall of 0.618, and PRES of 0.523

    Extraterritorial Intellectual Property Enforcement in the European Union

    Full text link
    This paper was prepared for the 2011 ABILA International Law Weekend – West volume of the Southwestern Journal of International Law. It addresses extraterritorial enforcement of intellectual property rights in the European Union. The maximum length of the paper was set by the Journal. The problems associated with extraterritorial enforcement of intellectual property rights in the European Union (the “EU”) may be divided into three categories: enforcement of unitary EU-wide rights, enforcement of multiple national rights, and enforcement of rights based on one national law with extraterritorial effects on activities in other countries. Although these are three distinct categories of problems, they are interconnected; problems in one category may exacerbate problems in another category, and solutions developed in one category may contribute to the resolution of problems in another category. This article briefly reviews the three categories of problems and demonstrates the interrelatedness of solutions that have been developed or will have to be developed to address the problems

    Special Libraries, April 1956

    Get PDF
    Volume 47, Issue 4https://scholarworks.sjsu.edu/sla_sl_1956/1003/thumbnail.jp

    International Harmonization of Patent Law: A Proposed Solution to the United States\u27 First-to-File Debate

    Get PDF
    As trade barriers diminish and global economies continue to expand, harmonization and enforcement of international patent protection becomes increasingly important. This note compares the U.S. system with other countries. It argues that the U.S. should harmonize with the rest of the world. Part I discusses the different systems for determining priority of invention and the recent movement towards harmonization of patent law. Part I also sets forth the recommendations of the 1992 Advisory Commission on Patent Law Reform relating to first-to-file. Part II presents the various conflicting arguments both in favor of and against adopting a first-to-file system. Part III argues that the United States should adopt a first-to-file system under the conditions specified in the 1992 Report From the Advisory Commission on Patent Law Reform. This Note concludes that the Commission Report presents a favorable solution to the first- to-file debate that will allow the United States to participate in and benefit from the forthcoming patent harmonization treaty

    International Harmonization of Patent Law: A Proposed Solution to the United States\u27 First-to-File Debate

    Get PDF
    As trade barriers diminish and global economies continue to expand, harmonization and enforcement of international patent protection becomes increasingly important. This note compares the U.S. system with other countries. It argues that the U.S. should harmonize with the rest of the world. Part I discusses the different systems for determining priority of invention and the recent movement towards harmonization of patent law. Part I also sets forth the recommendations of the 1992 Advisory Commission on Patent Law Reform relating to first-to-file. Part II presents the various conflicting arguments both in favor of and against adopting a first-to-file system. Part III argues that the United States should adopt a first-to-file system under the conditions specified in the 1992 Report From the Advisory Commission on Patent Law Reform. This Note concludes that the Commission Report presents a favorable solution to the first- to-file debate that will allow the United States to participate in and benefit from the forthcoming patent harmonization treaty

    The role of handbooks in knowledge creation and diffusion: A case of science and technology studies

    Get PDF
    Genre is considered to be an important element in scholarly communication and in the practice of scientific disciplines. However, scientometric studies have typically focused on a single genre, the journal article. The goal of this study is to understand the role that handbooks play in knowledge creation and diffusion and their relationship with the genre of journal articles, particularly in highly interdisciplinary and emergent social science and humanities disciplines. To shed light on these questions we focused on handbooks and journal articles published over the last four decades belonging to the research area of Science and Technology Studies (STS), broadly defined. To get a detailed picture we used the full-text of five handbooks (500,000 words) and a well-defined set of 11,700 STS articles. We confirmed the methodological split of STS into qualitative and quantitative (scientometric) approaches. Even when the two traditions explore similar topics (e.g., science and gender) they approach them from different starting points. The change in cognitive foci in both handbooks and articles partially reflects the changing trends in STS research, often driven by technology. Using text similarity measures we found that, in the case of STS, handbooks play no special role in either focusing the research efforts or marking their decline. In general, they do not represent the summaries of research directions that have emerged since the previous edition of the handbook.Comment: Accepted for publication in Journal of Informetric

    Multi-word expression-sensitive word alignment

    Get PDF
    This paper presents a new word alignment method which incorporates knowledge about Bilingual Multi-Word Expressions (BMWEs). Our method of word alignment first extracts such BMWEs in a bidirectional way for a given corpus and then starts conventional word alignment, considering the properties of BMWEs in their grouping as well as their alignment links. We give partial annotation of alignment links as prior knowledge to the word alignment process; by replacing the maximum likelihood estimate in the M-step of the IBM Models with the Maximum A Posteriori (MAP) estimate, prior knowledge about BMWEs is embedded in the prior in this MAP estimate. In our experiments, we saw an improvement of 0.77 Bleu points absolute in JP–EN. Except for one case, our method gave better results than the method using only BMWEs grouping. Even though this paper does not directly address the issues in Cross-Lingual Information Retrieval (CLIR), it discusses an approach of direct relevance to the field. This approach could be viewed as the opposite of current trends in CLIR on semantic space that incorporate a notion of order in the bag-of-words model (e.g. co-occurences)
    corecore