333 research outputs found

    Information Extraction in Illicit Domains

    Full text link
    Extracting useful entities and attribute values from illicit domains such as human trafficking is a challenging problem with the potential for widespread social impact. Such domains employ atypical language models, have `long tails' and suffer from the problem of concept drift. In this paper, we propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed for such domains. Our approach uses raw, unlabeled text from an initial corpus, and a few (12-120) seed annotations per domain-specific attribute, to learn robust IE models for unobserved pages and websites. Empirically, we demonstrate that our approach can outperform feature-centric Conditional Random Field baselines by over 18\% F-Measure on five annotated sets of real-world human trafficking datasets in both low-supervision and high-supervision settings. We also show that our approach is demonstrably robust to concept drift, and can be efficiently bootstrapped even in a serial computing environment.Comment: 10 pages, ACM WWW 201

    Harnessing Remote Sensing to Accomplish Full Carbon Accounting: Workshop Report

    Get PDF
    The workshop "Harnessing Remote Sensing to Accomplish Full Carbon Accounting" was held on December 9-11th, 1999 at IIASA with the intention of meeting the following objectives: (1) To Promote the mutual interests of remote sensing and carbon science communities by exchanging the ideas regarding the requirements for carbon accounting and the current available products derived from remote sensing land information systems; (2) To produce strategic recommendations on how to improve FCA at different scales with the use of remote sensing tools; and, (3) To develop a Framework to Apply Recommendations for Sub-global and National-Level Case Studies. Although these ambitious targets were only part met, three discussion group sessions resulted in describing: What is required to implement full carbon accounting; How remote sensing can be used to assist this implementation; and, How remote sensing can be used to reduce the uncertainties related to FCA. This report summarizes the presentations, discussions and results of this workshop and outlines the next steps to be taken by IIASA

    Harvesting Entities from the Web Using Unique Identifiers -- IBEX

    Full text link
    In this paper we study the prevalence of unique entity identifiers on the Web. These are, e.g., ISBNs (for books), GTINs (for commercial products), DOIs (for documents), email addresses, and others. We show how these identifiers can be harvested systematically from Web pages, and how they can be associated with human-readable names for the entities at large scale. Starting with a simple extraction of identifiers and names from Web pages, we show how we can use the properties of unique identifiers to filter out noise and clean up the extraction result on the entire corpus. The end result is a database of millions of uniquely identified entities of different types, with an accuracy of 73--96% and a very high coverage compared to existing knowledge bases. We use this database to compute novel statistics on the presence of products, people, and other entities on the Web.Comment: 30 pages, 5 figures, 9 tables. Complete technical report for A. Talaika, J. A. Biega, A. Amarilli, and F. M. Suchanek. IBEX: Harvesting Entities from the Web Using Unique Identifiers. WebDB workshop, 201

    Nesting Behavior of Palila, as Assessed from Video Recordings

    Get PDF
    We quantified nesting behavior of Palila (Loxioides bailleui), an endangered Hawaiian honeycreeper, by recording at nests during three breeding seasons using a black-and-white video camera connected to a videocassette recorder. A total of seven nests was observed. We measured the following factors for daylight hours: percentage of time the female was on the nest (attendance), length of attendance bouts by the female, length of nest recesses, and adult provisioning rates. Comparisons were made between three stages of the 40-day nesting cycle: incubation (day 1–day 16), early nestling stage (day 17–day 30 [i.e., nestlings ≀ 14 days old]), and late nestling stage (day 31–day 40 [i.e., nestlings \u3e 14 days old]). Of seven nests observed, four fledged at least one nestling and three failed. One of these failed nests was filmed being depredated by a feral cat (Felis catus). Female nest attendance was near 82% during the incubation stage and decreased to 21% as nestlings aged. We did not detect a difference in attendance bout length between stages of the nesting cycle. Mean length of nest recesses increased from 4.5 min during the incubation stage to over 45 min during the late nestling stage. Mean number of nest recesses per hour ranged from 1.6 to 2.0. Food was delivered to nestlings by adults an average of 1.8 times per hour for the early nestling stage and 1.5 times per hour during the late nestling stage and did not change over time. Characterization of parental behavior by video had similarities to but also key differences from findings taken from blind observations. Results from this study will facilitate greater understanding of Palila reproductive strategies

    Ranking deep web text collections for scalable information extraction

    Get PDF
    Information extraction (IE) systems discover structured in-formation from natural language text, to enable much richer querying and data mining than possible directly over the unstructured text. Unfortunately, IE is generally a com-putationally expensive process, and hence improving its ef-ficiency, so that it scales over large volumes of text, is of critical importance. State-of-the-art approaches for scaling the IE process focus on one text collection at a time. These approaches prioritize the extraction effort by learning key-word queries to identify the “useful ” documents for the IE task at hand, namely, those that lead to the extraction of structured “tuples. ” These approaches, however, do not at-tempt to predict which text collections are useful for the IE task—and hence merit further processing—and which ones will not contribute any useful output—and hence should be ignored altogether, for efficiency. In this paper, we focus on an especially valuable family of text sources, the so-called deep web collections, whose (remote) contents are only ac-cessible via querying. Specifically, we introduce and study techniques for ranking deep web collections for an IE task, to prioritize the extraction effort by focusing on collections with substantial numbers of useful documents for the task. We study both (adaptations of) state-of-the-art resource se-lection strategies for distributed information retrieval, and IE-specific approaches. Our extensive experimental eval-uation over realistic deep web collections, and for several different IE tasks, shows the merits and limitations of the alternative families of approaches, and provides a roadmap for addressing this critically important building block for efficient, scalable information extraction. 1

    Adeno-Associated Virus-Mediated Rescue of the Cognitive Defects in a Mouse Model for Angelman Syndrome

    Get PDF
    Angelman syndrome (AS), a genetic disorder occurring in approximately one in every 15,000 births, is characterized by severe mental retardation, seizures, difficulty speaking and ataxia. The gene responsible for AS was discovered to be UBE3A and encodes for E6-AP, an ubiquitin ligase. A unique feature of this gene is that it undergoes maternal imprinting in a neuron-specific manner. In the majority of AS cases, there is a mutation or deletion in the maternally inherited UBE3A gene, although other cases are the result of uniparental disomy or mismethylation of the maternal gene. While most human disorders characterized by severe mental retardation involve abnormalities in brain structure, no gross anatomical changes are associated with AS. However, we have determined that abnormal calcium/calmodulin-dependent protein kinase II (CaMKII) regulation is seen in the maternal UBE3A deletion AS mouse model and is responsible for the major phenotypes. Specifically, there is an increased αCaMKII phosphorylation at the autophosphorylation sites Thr286 and Thr305/306, resulting in an overall decrease in CaMKII activity. CaMKII is not produced until after birth, indicating that the deficits associated with AS are not the result of developmental abnormalities. The present studies are focused on exploring the potential to rescue the learning and memory deficits in the adult AS mouse model through the use of an adeno-associated virus (AAV) vector to increase neuronal UBE3A expression. These studies show that increasing the levels of E6-AP in the brain using an exogenous vector can improve the cognitive deficits associated with AS. Specifically, the associative learning deficit was ameliorated in the treated AS mice compared to the control AS mice, indicating that therapeutic intervention may be possible in older AS patients

    The EAGLE concept - A vision of a future European Land Monitoring Framework

    Get PDF
    Abstract. This paper describes the EAGLE concept, an object-oriented data model for land moni-toring. It highlights the background situation in the field of land monitoring, identifies the team in-volved, explains the technical and strategic considerations behind the concept, describes the cur-rent status of the harmonization and the developments made and outlines the future activities and requirements. After the structure and the content of the data model and matrix are explained, ex-amples are given on how to use the matrix. Besides its possible function as a semantic translation tool between different classification systems, it also can help to analyze class definitions to find semantic gaps, overlaps and inconsistencies and can serve as data model for new mapping initia-tives. On the long-term, the EAGLE concept aims at sketching a vision of a future integrated and harmonized European land monitoring system, which is designed to store all kinds of environmen-tally relevant information on the EarthÂŽs surface, coming from both national and European data sources. Being still in the state of development, some first applications and test cases are under way. This paper also dedicates a chapter referring to the context between the concept and remote sensing in general as well as the relation between land monitoring and the principles of the Euro
    • 

    corecore