340 research outputs found
Information Extraction in Illicit Domains
Extracting useful entities and attribute values from illicit domains such as
human trafficking is a challenging problem with the potential for widespread
social impact. Such domains employ atypical language models, have `long tails'
and suffer from the problem of concept drift. In this paper, we propose a
lightweight, feature-agnostic Information Extraction (IE) paradigm specifically
designed for such domains. Our approach uses raw, unlabeled text from an
initial corpus, and a few (12-120) seed annotations per domain-specific
attribute, to learn robust IE models for unobserved pages and websites.
Empirically, we demonstrate that our approach can outperform feature-centric
Conditional Random Field baselines by over 18\% F-Measure on five annotated
sets of real-world human trafficking datasets in both low-supervision and
high-supervision settings. We also show that our approach is demonstrably
robust to concept drift, and can be efficiently bootstrapped even in a serial
computing environment.Comment: 10 pages, ACM WWW 201
Harnessing Remote Sensing to Accomplish Full Carbon Accounting: Workshop Report
The workshop "Harnessing Remote Sensing to Accomplish Full Carbon Accounting" was held on December 9-11th, 1999 at IIASA with the intention of meeting the following objectives:
(1) To Promote the mutual interests of remote sensing and carbon science communities by exchanging the ideas regarding the requirements for carbon accounting and the current available products derived from remote sensing land information systems;
(2) To produce strategic recommendations on how to improve FCA at different scales with the use of remote sensing tools; and,
(3) To develop a Framework to Apply Recommendations for Sub-global and National-Level Case Studies.
Although these ambitious targets were only part met, three discussion group sessions resulted in describing: What is required to implement full carbon accounting; How remote sensing can be used to assist this implementation; and, How remote sensing can be used to reduce the uncertainties related to FCA.
This report summarizes the presentations, discussions and results of this workshop and outlines the next steps to be taken by IIASA
Harvesting Entities from the Web Using Unique Identifiers -- IBEX
In this paper we study the prevalence of unique entity identifiers on the
Web. These are, e.g., ISBNs (for books), GTINs (for commercial products), DOIs
(for documents), email addresses, and others. We show how these identifiers can
be harvested systematically from Web pages, and how they can be associated with
human-readable names for the entities at large scale.
Starting with a simple extraction of identifiers and names from Web pages, we
show how we can use the properties of unique identifiers to filter out noise
and clean up the extraction result on the entire corpus. The end result is a
database of millions of uniquely identified entities of different types, with
an accuracy of 73--96% and a very high coverage compared to existing knowledge
bases. We use this database to compute novel statistics on the presence of
products, people, and other entities on the Web.Comment: 30 pages, 5 figures, 9 tables. Complete technical report for A.
Talaika, J. A. Biega, A. Amarilli, and F. M. Suchanek. IBEX: Harvesting
Entities from the Web Using Unique Identifiers. WebDB workshop, 201
Nesting Behavior of Palila, as Assessed from Video Recordings
We quantified nesting behavior of Palila (Loxioides bailleui), an endangered Hawaiian honeycreeper, by recording at nests during three breeding seasons using a black-and-white video camera connected to a videocassette recorder. A total of seven nests was observed. We measured the following factors for daylight hours: percentage of time the female was on the nest (attendance), length of attendance bouts by the female, length of nest recesses, and adult provisioning rates. Comparisons were made between three stages of the 40-day nesting cycle: incubation (day 1âday 16), early nestling stage (day 17âday 30 [i.e., nestlings †14 days old]), and late nestling stage (day 31âday 40 [i.e., nestlings \u3e 14 days old]). Of seven nests observed, four fledged at least one nestling and three failed. One of these failed nests was filmed being depredated by a feral cat (Felis catus). Female nest attendance was near 82% during the incubation stage and decreased to 21% as nestlings aged. We did not detect a difference in attendance bout length between stages of the nesting cycle. Mean length of nest recesses increased from 4.5 min during the incubation stage to over 45 min during the late nestling stage. Mean number of nest recesses per hour ranged from 1.6 to 2.0. Food was delivered to nestlings by adults an average of 1.8 times per hour for the early nestling stage and 1.5 times per hour during the late nestling stage and did not change over time. Characterization of parental behavior by video had similarities to but also key differences from findings taken from blind observations. Results from this study will facilitate greater understanding of Palila reproductive strategies
Ranking deep web text collections for scalable information extraction
Information extraction (IE) systems discover structured in-formation from natural language text, to enable much richer querying and data mining than possible directly over the unstructured text. Unfortunately, IE is generally a com-putationally expensive process, and hence improving its ef-ficiency, so that it scales over large volumes of text, is of critical importance. State-of-the-art approaches for scaling the IE process focus on one text collection at a time. These approaches prioritize the extraction effort by learning key-word queries to identify the âuseful â documents for the IE task at hand, namely, those that lead to the extraction of structured âtuples. â These approaches, however, do not at-tempt to predict which text collections are useful for the IE taskâand hence merit further processingâand which ones will not contribute any useful outputâand hence should be ignored altogether, for efficiency. In this paper, we focus on an especially valuable family of text sources, the so-called deep web collections, whose (remote) contents are only ac-cessible via querying. Specifically, we introduce and study techniques for ranking deep web collections for an IE task, to prioritize the extraction effort by focusing on collections with substantial numbers of useful documents for the task. We study both (adaptations of) state-of-the-art resource se-lection strategies for distributed information retrieval, and IE-specific approaches. Our extensive experimental eval-uation over realistic deep web collections, and for several different IE tasks, shows the merits and limitations of the alternative families of approaches, and provides a roadmap for addressing this critically important building block for efficient, scalable information extraction. 1
Adeno-Associated Virus-Mediated Rescue of the Cognitive Defects in a Mouse Model for Angelman Syndrome
Angelman syndrome (AS), a genetic disorder occurring in approximately one in every 15,000 births, is characterized by severe mental retardation, seizures, difficulty speaking and ataxia. The gene responsible for AS was discovered to be UBE3A and encodes for E6-AP, an ubiquitin ligase. A unique feature of this gene is that it undergoes maternal imprinting in a neuron-specific manner. In the majority of AS cases, there is a mutation or deletion in the maternally inherited UBE3A gene, although other cases are the result of uniparental disomy or mismethylation of the maternal gene. While most human disorders characterized by severe mental retardation involve abnormalities in brain structure, no gross anatomical changes are associated with AS. However, we have determined that abnormal calcium/calmodulin-dependent protein kinase II (CaMKII) regulation is seen in the maternal UBE3A deletion AS mouse model and is responsible for the major phenotypes. Specifically, there is an increased αCaMKII phosphorylation at the autophosphorylation sites Thr286 and Thr305/306, resulting in an overall decrease in CaMKII activity. CaMKII is not produced until after birth, indicating that the deficits associated with AS are not the result of developmental abnormalities. The present studies are focused on exploring the potential to rescue the learning and memory deficits in the adult AS mouse model through the use of an adeno-associated virus (AAV) vector to increase neuronal UBE3A expression. These studies show that increasing the levels of E6-AP in the brain using an exogenous vector can improve the cognitive deficits associated with AS. Specifically, the associative learning deficit was ameliorated in the treated AS mice compared to the control AS mice, indicating that therapeutic intervention may be possible in older AS patients
The EAGLE concept - A vision of a future European Land Monitoring Framework
Abstract. This paper describes the EAGLE concept, an object-oriented data model for land moni-toring. It highlights the background situation in the field of land monitoring, identifies the team in-volved, explains the technical and strategic considerations behind the concept, describes the cur-rent status of the harmonization and the developments made and outlines the future activities and requirements. After the structure and the content of the data model and matrix are explained, ex-amples are given on how to use the matrix. Besides its possible function as a semantic translation tool between different classification systems, it also can help to analyze class definitions to find semantic gaps, overlaps and inconsistencies and can serve as data model for new mapping initia-tives. On the long-term, the EAGLE concept aims at sketching a vision of a future integrated and harmonized European land monitoring system, which is designed to store all kinds of environmen-tally relevant information on the EarthÂŽs surface, coming from both national and European data sources. Being still in the state of development, some first applications and test cases are under way. This paper also dedicates a chapter referring to the context between the concept and remote sensing in general as well as the relation between land monitoring and the principles of the Euro
- âŠ