Search CORE

5,058 research outputs found

Self-supervised automated wrapper generation for weblog data extraction

Author: A. Laender
B. Adelberg
C. Kohlschütter
I. Muslea
N. Kushmerick
P. Geibel
R. Baumgartner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Data extraction from the web is notoriously hard. Of the types of resources available on the web, weblogs are becoming increasingly important due to the continued growth of the blogosphere, but remain poorly explored. Past approaches to data extraction from weblogs have often involved manual intervention and suffer from low scalability. This paper proposes a fully automated information extraction methodology based on the use of web feeds and processing of HTML. The approach includes a model for generating a wrapper that exploits web feeds for deriving a set of extraction rules automatically. Instead of performing a pairwise comparison between posts, the model matches the values of the web feeds against their corresponding HTML elements retrieved from multiple weblog posts. It adopts a probabilistic approach for deriving a set of rules and automating the process of wrapper generation. An evaluation of the model is conducted on a dataset of 2,393 posts and the results (92% accuracy) show that the proposed technique enables robust extraction of weblog properties and can be applied across the blogosphere for applications such as improved information retrieval and more robust web preservation initiatives

Durham Research Online

Crossref

UCL Discovery

Warwick Research Archives Portal Repository

Harvesting Entities from the Web Using Unique Identifiers -- IBEX

Author: Banko M.
Baumgartner R.
Crescenzi V.
Freitag D.
Nakashole N.
Probst K.
Putthividhya D.
Talaika A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/05/2015
Field of study

In this paper we study the prevalence of unique entity identifiers on the Web. These are, e.g., ISBNs (for books), GTINs (for commercial products), DOIs (for documents), email addresses, and others. We show how these identifiers can be harvested systematically from Web pages, and how they can be associated with human-readable names for the entities at large scale. Starting with a simple extraction of identifiers and names from Web pages, we show how we can use the properties of unique identifiers to filter out noise and clean up the extraction result on the entire corpus. The end result is a database of millions of uniquely identified entities of different types, with an accuracy of 73--96% and a very high coverage compared to existing knowledge bases. We use this database to compute novel statistics on the presence of products, people, and other entities on the Web.Comment: 30 pages, 5 figures, 9 tables. Complete technical report for A. Talaika, J. A. Biega, A. Amarilli, and F. M. Suchanek. IBEX: Harvesting Entities from the Web Using Unique Identifiers. WebDB workshop, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

HIV as a Chronic Illness: Identity Incorporation and Learning

Author: Baumgartner Lisa M.
David Keegan N.
Publication venue: 'New Prairie Press'
Publication date: 11/08/2008
Field of study

Abstract: The purpose of this session is twofold: (1) to review tentative findings of a study-in-progress concerning the identity incorporation process and learning of people living with HIV as a chronic illness and (2) to explore issues encountered in conducting research with the chronically ill

Kansas State University

“HIV is Only One Part of Me”: HIV and Its Effect on Other Identities

Author: Baumgartner Lisa M.
David Keegan N.
Publication venue: 'New Prairie Press'
Publication date: 03/06/2010
Field of study

The purpose of this study was to investigate the effect of the HIV identity on other identities. The spiritual and advocate identities increased in salience whereas work and sexual identities decreased. Younger participants fretted about physical appearance. Older participants focused on health. There are implications for adult educators

Kansas State University

PoZitively Transformative: The Transformative Learning of People Living with HIV

Author: Baumgartner Lisa M.
David Keegan N.
Publication venue: 'New Prairie Press'
Publication date: 28/05/2009
Field of study

The purpose of this study was to investigate meaning making in People Living with HIV (PLWH) as a chronic illness. Findings confirm those of Courtenay, Merriam and Reeves (1998) who examined meaning making in PLWHAs when HIV/AIDS was a terminal illness. Contextual factors that mediate meaning making were uncovered

Kansas State University

Effectiveness of Hindman's theorem for bounded sums

Author: A Rumyantsev
C Jockusch
CT Chong
H Towsner
JE Baumgartner
N Hindman
N Hindman
PA Cholak
RI Soare
S Simpson
WW Comfort
Publication venue
Publication date: 27/03/2016
Field of study

We consider the strength and effective content of restricted versions of Hindman's Theorem in which the number of colors is specified and the length of the sums has a specified finite bound. Let

\mathsf{HT}^{\leq n}_k

denote the assertion that for each

k

-coloring

c

\mathbb{N}

there is an infinite set

X \subseteq \mathbb{N}

such that all sums

\sum_{x \in F} x

for

F \subseteq X

and

0 < |F| \leq n

have the same color. We prove that there is a computable

2

-coloring

c

\mathbb{N}

such that there is no infinite computable set

X

such that all nonempty sums of at most

2

elements of

X

have the same color. It follows that

\mathsf{HT}^{\leq 2}_2

is not provable in

\mathsf{RCA}_0

and in fact we show that it implies

\mathsf{SRT}^2_2

\mathsf{RCA}_0

. We also show that there is a computable instance of

\mathsf{HT}^{\leq 3}_3

with all solutions computing

0'

. The proof of this result shows that

\mathsf{HT}^{\leq 3}_3

implies

\mathsf{ACA}_0

\mathsf{RCA}_0

arXiv.org e-Print Archive

Crossref

Consanguinity and rare mutations outside of MCCC genes underlie nonspecific phenotypes of MCCD.

Author: Barshop Bruce A
Baumgartner Matthias R
Frazer Kelly A
Hansen John-Bjarne
Jepsen Kristen
Shepard Peter J
Smith Erin N
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

Purpose3-Methylcrotonyl-CoA carboxylase deficiency (MCCD) is an autosomal recessive disorder of leucine catabolism that has a highly variable clinical phenotype, ranging from acute metabolic acidosis to nonspecific symptoms such as developmental delay, failure to thrive, hemiparesis, muscular hypotonia, and multiple sclerosis. Implementation of newborn screening for MCCD has resulted in broadening the range of phenotypic expression to include asymptomatic adults. The purpose of this study was to identify factors underlying the varying phenotypes of MCCD.MethodsWe performed exome sequencing on DNA from 33 cases and 108 healthy controls. We examined these data for associations between either MCC mutational status, genetic ancestry, or consanguinity and the absence or presence/specificity of clinical symptoms in MCCD cases.ResultsWe determined that individuals with nonspecific clinical phenotypes are highly inbred compared with cases that are asymptomatic and healthy controls. For 5 of these 10 individuals, we discovered a homozygous damaging mutation in a disease gene that is likely to underlie their nonspecific clinical phenotypes previously attributed to MCCD.ConclusionOur study shows that nonspecific phenotypes attributed to MCCD are associated with consanguinity and are likely not due to mutations in the MCC enzyme but result from rare homozygous mutations in other disease genes.Genet Med 17 8, 660-667

PubMed Central

eScholarship - University of California

ZORA

A UIMA wrapper for the NCBO annotator

Author: Baumgartner
C. Jonquet
C. Roeder
Hunter
K. Verspoor
L. Hunter
N. H. Shah
W. A. Baumgartner
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Summary: The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator—an ontology-based annotation service—to make it available as a component in UIMA workflows

Crossref

PubMed Central

HAL Descartes

University of Melbourne Institutional Repository

Two-qutrit Entanglement Witnesses and Gell-Mann Matrices

Author: Baumgartner
Bertlmann
Doherty
Doherty
Horodecki
Horodecki
Jafarizadeh
Jafarizadeh
Jafarizadeh
Jafarizadeh
Klimov
Lewenstein
M. A. Jafarizadeh
N. Behzadi
Vianna
Woronowicz
Y. Akbari
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/02/2008
Field of study

The Gell-Mann

\lambda

matrices for Lie algebra su(3) are the natural basis for the Hilbert space of Hermitian operators acting on the states of a three-level system(qutrit). So the construction of EWs for two-qutrit states by using these matrices may be an interesting problem. In this paper, several two-qutrit EWs are constructed based on the Gell-Mann matrices by using the linear programming (LP) method exactly or approximately. The decomposability and non-decomposability of constructed EWs are also discussed and it is shown that the

\lambda

-diagonal EWs presented in this paper are all decomposable but there exist non-decomposable ones among

\lambda

-non-diagonal EWs.Comment: 25 page

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)