21,470 research outputs found

    A Supervised Learning Approach to Acronym Identification

    Get PDF
    This paper addresses the task of finding acronym-definition pairs in text. Most of the previous work on the topic is about systems that involve manually generated rules or regular expressions. In this paper, we present a supervised learning approach to the acronym identification task. Our approach reduces the search space of the supervised learning system by putting some weak constraints on the kinds of acronym-definition pairs that can be identified. We obtain results comparable to hand-crafted systems that use stronger constraints. We describe our method for reducing the search space, the features used by our supervised learning system, and our experiments with various learning schemes

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    Patent Clutter

    Get PDF
    Patent claims are supposed to clearly and succinctly describe the patented invention, and only the patented invention. This Article hypothesizes that a substantial amount of language in patent claims is in fact not about the core invention, which may contribute to well-documented problems with patent claims. I analyze the claims of 40,000 patents and applications, and document the proliferation of “clutter”—language in patent claims that is not about the invention. Although claims are supposed to be exclusively about the invention, clutter appears across industries and makes up approximately 25% of claim language. Patent clutter may contribute several major problems in patent law. Extensive clutter makes patent claims harder to search. Excessive language in patent claims may be the result of over-claiming—when patentees describe potential corollaries they do not possess—thereby making the patent so broad in scope as to be invalid. More generally, it strains the comprehensibility of patents and burdens the resources of patent examiners. After arguing that patent clutter may contribute to these various problems, this Article turns to reforms. Rejections based on prolix, lack of enablement, and lack of written description can be crafted to dispose of the worst offenders, and better algorithms and different litigation rules can allow the patent system to adapt (and even benefit from) the remaining uses of excess language. The Article additionally generates important theoretical insights. Claims are often thought of as entirely synonymous with the invention and all elements of the claim are thought to relate equally strongly to the invention. This Article suggests empirically that these assumptions do not hold in practice, and offers a framework for restructuring conceptions of the relationship between claims and the invention
    • …
    corecore