5,058 research outputs found

    Self-supervised automated wrapper generation for weblog data extraction

    Get PDF
    Data extraction from the web is notoriously hard. Of the types of resources available on the web, weblogs are becoming increasingly important due to the continued growth of the blogosphere, but remain poorly explored. Past approaches to data extraction from weblogs have often involved manual intervention and suffer from low scalability. This paper proposes a fully automated information extraction methodology based on the use of web feeds and processing of HTML. The approach includes a model for generating a wrapper that exploits web feeds for deriving a set of extraction rules automatically. Instead of performing a pairwise comparison between posts, the model matches the values of the web feeds against their corresponding HTML elements retrieved from multiple weblog posts. It adopts a probabilistic approach for deriving a set of rules and automating the process of wrapper generation. An evaluation of the model is conducted on a dataset of 2,393 posts and the results (92% accuracy) show that the proposed technique enables robust extraction of weblog properties and can be applied across the blogosphere for applications such as improved information retrieval and more robust web preservation initiatives

    Harvesting Entities from the Web Using Unique Identifiers -- IBEX

    Full text link
    In this paper we study the prevalence of unique entity identifiers on the Web. These are, e.g., ISBNs (for books), GTINs (for commercial products), DOIs (for documents), email addresses, and others. We show how these identifiers can be harvested systematically from Web pages, and how they can be associated with human-readable names for the entities at large scale. Starting with a simple extraction of identifiers and names from Web pages, we show how we can use the properties of unique identifiers to filter out noise and clean up the extraction result on the entire corpus. The end result is a database of millions of uniquely identified entities of different types, with an accuracy of 73--96% and a very high coverage compared to existing knowledge bases. We use this database to compute novel statistics on the presence of products, people, and other entities on the Web.Comment: 30 pages, 5 figures, 9 tables. Complete technical report for A. Talaika, J. A. Biega, A. Amarilli, and F. M. Suchanek. IBEX: Harvesting Entities from the Web Using Unique Identifiers. WebDB workshop, 201

    HIV as a Chronic Illness: Identity Incorporation and Learning

    Get PDF
    Abstract: The purpose of this session is twofold: (1) to review tentative findings of a study-in-progress concerning the identity incorporation process and learning of people living with HIV as a chronic illness and (2) to explore issues encountered in conducting research with the chronically ill

    “HIV is Only One Part of Me”: HIV and Its Effect on Other Identities

    Get PDF
    The purpose of this study was to investigate the effect of the HIV identity on other identities. The spiritual and advocate identities increased in salience whereas work and sexual identities decreased. Younger participants fretted about physical appearance. Older participants focused on health. There are implications for adult educators

    PoZitively Transformative: The Transformative Learning of People Living with HIV

    Get PDF
    The purpose of this study was to investigate meaning making in People Living with HIV (PLWH) as a chronic illness. Findings confirm those of Courtenay, Merriam and Reeves (1998) who examined meaning making in PLWHAs when HIV/AIDS was a terminal illness. Contextual factors that mediate meaning making were uncovered

    Effectiveness of Hindman's theorem for bounded sums

    Full text link
    We consider the strength and effective content of restricted versions of Hindman's Theorem in which the number of colors is specified and the length of the sums has a specified finite bound. Let HTkn\mathsf{HT}^{\leq n}_k denote the assertion that for each kk-coloring cc of N\mathbb{N} there is an infinite set XNX \subseteq \mathbb{N} such that all sums xFx\sum_{x \in F} x for FXF \subseteq X and 0<Fn0 < |F| \leq n have the same color. We prove that there is a computable 22-coloring cc of N\mathbb{N} such that there is no infinite computable set XX such that all nonempty sums of at most 22 elements of XX have the same color. It follows that HT22\mathsf{HT}^{\leq 2}_2 is not provable in RCA0\mathsf{RCA}_0 and in fact we show that it implies SRT22\mathsf{SRT}^2_2 in RCA0\mathsf{RCA}_0. We also show that there is a computable instance of HT33\mathsf{HT}^{\leq 3}_3 with all solutions computing 00'. The proof of this result shows that HT33\mathsf{HT}^{\leq 3}_3 implies ACA0\mathsf{ACA}_0 in RCA0\mathsf{RCA}_0

    Consanguinity and rare mutations outside of MCCC genes underlie nonspecific phenotypes of MCCD.

    Get PDF
    Purpose3-Methylcrotonyl-CoA carboxylase deficiency (MCCD) is an autosomal recessive disorder of leucine catabolism that has a highly variable clinical phenotype, ranging from acute metabolic acidosis to nonspecific symptoms such as developmental delay, failure to thrive, hemiparesis, muscular hypotonia, and multiple sclerosis. Implementation of newborn screening for MCCD has resulted in broadening the range of phenotypic expression to include asymptomatic adults. The purpose of this study was to identify factors underlying the varying phenotypes of MCCD.MethodsWe performed exome sequencing on DNA from 33 cases and 108 healthy controls. We examined these data for associations between either MCC mutational status, genetic ancestry, or consanguinity and the absence or presence/specificity of clinical symptoms in MCCD cases.ResultsWe determined that individuals with nonspecific clinical phenotypes are highly inbred compared with cases that are asymptomatic and healthy controls. For 5 of these 10 individuals, we discovered a homozygous damaging mutation in a disease gene that is likely to underlie their nonspecific clinical phenotypes previously attributed to MCCD.ConclusionOur study shows that nonspecific phenotypes attributed to MCCD are associated with consanguinity and are likely not due to mutations in the MCC enzyme but result from rare homozygous mutations in other disease genes.Genet Med 17 8, 660-667

    A UIMA wrapper for the NCBO annotator

    Get PDF
    Summary: The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator—an ontology-based annotation service—to make it available as a component in UIMA workflows

    Two-qutrit Entanglement Witnesses and Gell-Mann Matrices

    Full text link
    The Gell-Mann λ\lambda matrices for Lie algebra su(3) are the natural basis for the Hilbert space of Hermitian operators acting on the states of a three-level system(qutrit). So the construction of EWs for two-qutrit states by using these matrices may be an interesting problem. In this paper, several two-qutrit EWs are constructed based on the Gell-Mann matrices by using the linear programming (LP) method exactly or approximately. The decomposability and non-decomposability of constructed EWs are also discussed and it is shown that the λ\lambda-diagonal EWs presented in this paper are all decomposable but there exist non-decomposable ones among λ\lambda-non-diagonal EWs.Comment: 25 page
    corecore