21,470 research outputs found
A Supervised Learning Approach to Acronym Identification
This paper addresses the task of finding acronym-definition pairs in text. Most of the previous work on the topic is about systems that involve manually generated rules or regular expressions. In this paper, we present a
supervised learning approach to the acronym identification task. Our approach reduces the search space of the supervised learning system by putting some weak constraints on the kinds of acronym-definition pairs that can be identified. We obtain results comparable to hand-crafted systems that use stronger constraints. We describe our method for reducing the search space, the features
used by our supervised learning system, and our experiments with various learning schemes
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
Patent Clutter
Patent claims are supposed to clearly and succinctly describe the patented invention, and only the patented invention. This Article hypothesizes that a substantial amount of language in patent claims is in fact not about the core invention, which may contribute to well-documented problems with patent claims. I analyze the claims of 40,000 patents and applications, and document the proliferation of “clutter”—language in patent claims that is not about the invention. Although claims are supposed to be exclusively about the invention, clutter appears across industries and makes up approximately 25% of claim language. Patent clutter may contribute several major problems in patent law. Extensive clutter makes patent claims harder to search. Excessive language in patent claims may be the result of over-claiming—when patentees describe potential corollaries they do not possess—thereby making the patent so broad in scope as to be invalid. More generally, it strains the comprehensibility of patents and burdens the resources of patent examiners. After arguing that patent clutter may contribute to these various problems, this Article turns to reforms. Rejections based on prolix, lack of enablement, and lack of written description can be crafted to dispose of the worst offenders, and better algorithms and different litigation rules can allow the patent system to adapt (and even benefit from) the remaining uses of excess language. The Article additionally generates important theoretical insights. Claims are often thought of as entirely synonymous with the invention and all elements of the claim are thought to relate equally strongly to the invention. This Article suggests empirically that these assumptions do not hold in practice, and offers a framework for restructuring conceptions of the relationship between claims and the invention
- …