60 research outputs found

    Corpus-Driven Knowledge Acquisition for Discourse Analysis

    Full text link
    The availability of large on-line text corpora provides a natural and promising bridge between the worlds of natural language processing (NLP) and machine learning (ML). In recent years, the NLP community has been aggressively investigating statistical techniques to drive part-of-speech taggers, but application-specific text corpora can be used to drive knowledge acquisition at much higher levels as well. In this paper we will show how ML techniques can be used to support knowledge acquisition for information extraction systems. It is often very difficult to specify an explicit domain model for many information extraction applications, and it is always labor intensive to implement hand-coded heuristics for each new domain. We have discovered that it is nevertheless possible to use ML algorithms in order to capture knowledge that is only implicitly present in a representative text corpus. Our work addresses issues traditionally associated with discourse analysis and intersentential inference generation, and demonstrates the utility of ML algorithms at this higher level of language analysis. The benefits of our work address the portability and scalability of information extraction (IE) technologies. When hand-coded heuristics are used to manage discourse analysis in an information extraction system, months of programming effort are easily needed to port a successful IE system to a new domain. We will show how ML algorithms can reduce thisComment: 6 pages, AAAI-9

    Using Decision Trees for Coreference Resolution

    Full text link
    This paper describes RESOLVE, a system that uses decision trees to learn how to classify coreferent phrases in the domain of business joint ventures. An experiment is presented in which the performance of RESOLVE is compared to the performance of a manually engineered set of rules for the same task. The results show that decision trees achieve higher performance than the rules in two of three evaluation metrics developed for the coreference task. In addition to achieving better performance than the rules, RESOLVE provides a framework that facilitates the exploration of the types of knowledge that are useful for solving the coreference problem.Comment: 6 pages; LaTeX source; 1 uuencoded compressed EPS file (separate); uses ijcai95.sty, named.bst, epsf.tex; to appear in Proc. IJCAI '9

    CRYSTAL: Inducing a Conceptual Dictionary

    Full text link
    One of the central knowledge sources of an information extraction system is a dictionary of linguistic patterns that can be used to identify the conceptual content of a text. This paper describes CRYSTAL, a system which automatically induces a dictionary of "concept-node definitions" sufficient to identify relevant information from a training corpus. Each of these concept-node definitions is generalized as far as possible without producing errors, so that a minimum number of dictionary entries cover the positive training instances. Because it tests the accuracy of each proposed definition, CRYSTAL can often surpass human intuitions in creating reliable extraction rules.Comment: 6 pages, Postscript, IJCAI-95 http://ciir.cs.umass.edu/info/psfiles/tepubs/tepubs.htm

    Cherenkov luminescence measurements with digital silicon photomultipliers: a feasibility study.

    Get PDF
    BackgroundA feasibility study was done to assess the capability of digital silicon photomultipliers to measure the Cherenkov luminescence emitted by a β source. Cherenkov luminescence imaging (CLI) is possible with a charge coupled device (CCD) based technology, but a stand-alone technique for quantitative activity measurements based on Cherenkov luminescence has not yet been developed. Silicon photomultipliers (SiPMs) are photon counting devices with a fast impulse response and can potentially be used to quantify β-emitting radiotracer distributions by CLI.MethodsIn this study, a Philips digital photon counting (PDPC) silicon photomultiplier detector was evaluated for measuring Cherenkov luminescence. The PDPC detector is a matrix of avalanche photodiodes, which were read one at a time in a dark count map (DCM) measurement mode (much like a CCD). This reduces the device active area but allows the information from a single avalanche photodiode to be preserved, which is not possible with analog SiPMs. An algorithm to reject the noisiest photodiodes and to correct the measured count rate for the dark current was developed.ResultsThe results show that, in DCM mode and at (10-13) °C, the PDPC has a dynamic response to different levels of Cherenkov luminescence emitted by a β source and transmitted through an opaque medium. This suggests the potential for this approach to provide quantitative activity measurements. Interestingly, the potential use of the PDPC in DCM mode for direct imaging of Cherenkov luminescence, as a opposed to a scalar measurement device, was also apparent.ConclusionsWe showed that a PDPC tile in DCM mode is able to detect and image a β source through its Cherenkov radiation emission. The detector's dynamic response to different levels of radiation suggests its potential quantitative capabilities, and the DCM mode allows imaging with a better spatial resolution than the conventional event-triggered mode. Finally, the same acquisition procedure and data processing could be employed also for other low light levels applications, such as bioluminescence

    Beached bachelors: An extensive study on the largest recorded sperm whale Physeter macrocephalus mortality event in the North Sea

    Get PDF
    Between the 8th January and the 25th February 2016, the largest sperm whale Physeter macrocephalus mortality event ever recorded in the North Sea occurred with 30 sperm whales stranding in five countries within six weeks. All sperm whales were immature males. Groups were stratified by size, with the smaller animals stranding in the Netherlands, and the largest in England. The majority (n = 27) of the stranded animals were necropsied and/ or sampled, allowing for an international and comprehensive investigation into this mortality event. The animals were in fair to good nutritional condition and, aside from the pathologies caused by stranding, did not exhibit significant evidence of disease or trauma. Infectious agents were found, including various parasite species, several bacterial and fungal pathogens and a novel alphaherpesvirus. In nine of the sperm whales a variety of marine litter was found. However, none of these findings were considered to have been the primary cause of the stranding event. Potential anthropogenic and environmental factors that may have caused the sperm whales to enter the North Sea were assessed. Once sperm whales enter the North Sea and head south, the water becomes progressively shallower (<40 m), making this region a global hotspot for sperm whale strandings. We conclude that the reasons for sperm whales to enter the southern North Sea are the result of complex interactions of extrinsic environmental factors. As such, these large mortality events seldom have a single ultimate cause and it is only through multidisciplinary, collaborative approaches that potentially multifactorial large-scale stranding events can be effectively investigated

    A Performance Evaluation of Text Analysis Technologies

    No full text
    This report describes the most recent and most sophisticated of these evaluations, the Third Message Understanding Conference (MUC-3

    Cognition, Computers, and Car Bombs: How Yale Prepared Me for the 90&apos;s

    No full text
    early writings on artificial intelligence (Feigenbaum and Feldman 1963). It was here that I learned about a community of people who were trying to unravel the mysteries of human cognition by playing around with computers. This seemed a lot more interesting than Riemannian manifolds and Hausdorff spaces, or maybe I was just getting tired of all that time on the subway. One way or another, I decided to apply to a graduate program in computer science just in case there was some stronger connection between FORTRAN and human cognition than I had previously suspected. When Yale accepted me, I decided to throw all caution to the wind and trust the admissions committee. I packed up my basenji and set out for Yale in the summer of 1974 with a sense of grand adventure. I was moving toward light and truth, and my very first full screen text editor. As luck would have it, Professor Roger Schank, a specialist in artificial intelligence (AI) from Stanford, was also moving to Yale that same summer.
    corecore