6,880 research outputs found
Self-supervised automated wrapper generation for weblog data extraction
Data extraction from the web is notoriously hard. Of the types of resources available on the web, weblogs are becoming increasingly important due to the continued growth of the blogosphere, but remain poorly explored. Past approaches to data extraction from weblogs have often involved manual intervention and suffer from low scalability. This paper proposes a fully automated information extraction methodology based on the use of web feeds and processing of HTML. The approach includes a model for generating a wrapper that exploits web feeds for deriving a set of extraction rules automatically. Instead of performing a pairwise comparison between posts, the model matches the values of the web feeds against their corresponding HTML elements retrieved from multiple weblog posts. It adopts a probabilistic approach for deriving a set of rules and automating the process of wrapper generation. An evaluation of the model is conducted on a dataset of 2,393 posts and the results (92% accuracy) show that the proposed technique enables robust extraction of weblog properties and can be applied across the blogosphere for applications such as improved information retrieval and more robust web preservation initiatives
Asymptotic Entanglement and Lindblad Dynamics: a Perturbative Approach
We consider an open bipartite quantum system with dissipative Lindblad type
dynamics. In order to study the entanglement of the stationary states, we
develop a perturbative approach and apply it to the physically significant case
when a purely dissipative perturbation is added to the unperturbed generator
which by itself would produce reversible unitary dynamics.Comment: 15 page
Bacterial meningitis in older neonates
During a five-year period, 24 patients' conditions (age range, 2 to 6 weeks) were diagnosed, and they were treated for bacterial meningitis. Organisms recovered from the CSF included group B Streptococcus (n = 6), Escherichia coli (n = 5), Listeria monocytogenes (n = 5), Hemophilus influenzae (n = 4), Streptococcus pneumoniae (n = 2), and group D and group A Streptococcus (one each). Initial antimicrobial therapy must include antibiotics that are effective across this spectrum of potential pathogens. Symptoms and signs were often subtle. Six children (25%) experienced major neurologic residua, including five patients (21%) in whom hydrocephalus developed. Ultrasound examination of the head at the end of therapy was an effective technique for early assessment of neurologic sequelae
Hierarchic Superposition Revisited
Many applications of automated deduction require reasoning in first-order
logic modulo background theories, in particular some form of integer
arithmetic. A major unsolved research challenge is to design theorem provers
that are "reasonably complete" even in the presence of free function symbols
ranging into a background theory sort. The hierarchic superposition calculus of
Bachmair, Ganzinger, and Waldmann already supports such symbols, but, as we
demonstrate, not optimally. This paper aims to rectify the situation by
introducing a novel form of clause abstraction, a core component in the
hierarchic superposition calculus for transforming clauses into a form needed
for internal operation. We argue for the benefits of the resulting calculus and
provide two new completeness results: one for the fragment where all
background-sorted terms are ground and another one for a special case of linear
(integer or rational) arithmetic as a background theory
Ethyl Glucuronide in Scalp and Non-head Hair: An Intra-individual Comparison
Aims: Analysis of ethyl glucuronide (EtG), a minor metabolite of ethanol, is a valid tool for the assessment of social and chronic excessive alcohol consumption. Standardized analysis of EtG is usually done in head hair. As head hair cannot always be provided, alternative hair matrices become more and more interesting. Therefore, a study was performed that compared the intra-individual EtG concentrations in scalp hair and non-head hair (chest, arm, leg and axillary hair). Methods: Hair samples were collected from 68 subjects undergoing an expert assessment for fitness to drive. Aqueous extracts of the hair matrix were cleaned by solid-phase extraction, using an Oasis MAX column. EtG was first derivatized with perfluoropentanoic anhydride and then quantified by GC-MS/MS in negative chemical ionization mode, using EtG-d5 as internal standard. Results: For categorizing drinking behaviour, the two EtG cut-off values recommended by the Society of Hair Testing were applied for all different hair types. For chest, arm and leg hair, correct classification ratios were >83%. This corresponds to sensitivity values >78% and specificities >75%. Such values indicate together with φ coefficients (rφ) > 0.7 a high correlation of the categorization of the drinking behaviour based on these body hair EtG concentrations compared with the indexing based on scalp hair EtG-values. However, it must be taken into consideration that the time frame represented by non-head hair may extend way back. Conclusions: These results indicate that chest, arm and leg hair can be a valid alternative to assess the drinking behaviour of a subject if head hair is not available; whereas axillary hair is not suitable as alternative matri
Harvesting Entities from the Web Using Unique Identifiers -- IBEX
In this paper we study the prevalence of unique entity identifiers on the
Web. These are, e.g., ISBNs (for books), GTINs (for commercial products), DOIs
(for documents), email addresses, and others. We show how these identifiers can
be harvested systematically from Web pages, and how they can be associated with
human-readable names for the entities at large scale.
Starting with a simple extraction of identifiers and names from Web pages, we
show how we can use the properties of unique identifiers to filter out noise
and clean up the extraction result on the entire corpus. The end result is a
database of millions of uniquely identified entities of different types, with
an accuracy of 73--96% and a very high coverage compared to existing knowledge
bases. We use this database to compute novel statistics on the presence of
products, people, and other entities on the Web.Comment: 30 pages, 5 figures, 9 tables. Complete technical report for A.
Talaika, J. A. Biega, A. Amarilli, and F. M. Suchanek. IBEX: Harvesting
Entities from the Web Using Unique Identifiers. WebDB workshop, 201
Consanguinity and rare mutations outside of MCCC genes underlie nonspecific phenotypes of MCCD.
Purpose3-Methylcrotonyl-CoA carboxylase deficiency (MCCD) is an autosomal recessive disorder of leucine catabolism that has a highly variable clinical phenotype, ranging from acute metabolic acidosis to nonspecific symptoms such as developmental delay, failure to thrive, hemiparesis, muscular hypotonia, and multiple sclerosis. Implementation of newborn screening for MCCD has resulted in broadening the range of phenotypic expression to include asymptomatic adults. The purpose of this study was to identify factors underlying the varying phenotypes of MCCD.MethodsWe performed exome sequencing on DNA from 33 cases and 108 healthy controls. We examined these data for associations between either MCC mutational status, genetic ancestry, or consanguinity and the absence or presence/specificity of clinical symptoms in MCCD cases.ResultsWe determined that individuals with nonspecific clinical phenotypes are highly inbred compared with cases that are asymptomatic and healthy controls. For 5 of these 10 individuals, we discovered a homozygous damaging mutation in a disease gene that is likely to underlie their nonspecific clinical phenotypes previously attributed to MCCD.ConclusionOur study shows that nonspecific phenotypes attributed to MCCD are associated with consanguinity and are likely not due to mutations in the MCC enzyme but result from rare homozygous mutations in other disease genes.Genet Med 17 8, 660-667
Electoral turnover has very little effect on the spending habits of Western democracies
Do new electoral brooms sweep clean the economic policies of the parties that went before? In new research that examines how incoming Western governments set their spending priorities, Derek A. Epp, John Lovett, and Frank R. Baumgartner find that budgets tend to be set with little regard to a government’s ideology, be it left or right. They argue that when setting budgets, incoming policymakers are constrained by social, economic and international realities that are largely beyond their control. This means that budgets are set consistently and inconsistently with what went before at roughly the same rate; left-wing parties do not necessarily favor “big government” nor to right parties always seek to reduce government spending
- …