3,756 research outputs found

    Self-supervised automated wrapper generation for weblog data extraction

    Get PDF
    Data extraction from the web is notoriously hard. Of the types of resources available on the web, weblogs are becoming increasingly important due to the continued growth of the blogosphere, but remain poorly explored. Past approaches to data extraction from weblogs have often involved manual intervention and suffer from low scalability. This paper proposes a fully automated information extraction methodology based on the use of web feeds and processing of HTML. The approach includes a model for generating a wrapper that exploits web feeds for deriving a set of extraction rules automatically. Instead of performing a pairwise comparison between posts, the model matches the values of the web feeds against their corresponding HTML elements retrieved from multiple weblog posts. It adopts a probabilistic approach for deriving a set of rules and automating the process of wrapper generation. An evaluation of the model is conducted on a dataset of 2,393 posts and the results (92% accuracy) show that the proposed technique enables robust extraction of weblog properties and can be applied across the blogosphere for applications such as improved information retrieval and more robust web preservation initiatives

    Harvesting Entities from the Web Using Unique Identifiers -- IBEX

    Full text link
    In this paper we study the prevalence of unique entity identifiers on the Web. These are, e.g., ISBNs (for books), GTINs (for commercial products), DOIs (for documents), email addresses, and others. We show how these identifiers can be harvested systematically from Web pages, and how they can be associated with human-readable names for the entities at large scale. Starting with a simple extraction of identifiers and names from Web pages, we show how we can use the properties of unique identifiers to filter out noise and clean up the extraction result on the entire corpus. The end result is a database of millions of uniquely identified entities of different types, with an accuracy of 73--96% and a very high coverage compared to existing knowledge bases. We use this database to compute novel statistics on the presence of products, people, and other entities on the Web.Comment: 30 pages, 5 figures, 9 tables. Complete technical report for A. Talaika, J. A. Biega, A. Amarilli, and F. M. Suchanek. IBEX: Harvesting Entities from the Web Using Unique Identifiers. WebDB workshop, 201

    Consanguinity and rare mutations outside of MCCC genes underlie nonspecific phenotypes of MCCD.

    Get PDF
    Purpose3-Methylcrotonyl-CoA carboxylase deficiency (MCCD) is an autosomal recessive disorder of leucine catabolism that has a highly variable clinical phenotype, ranging from acute metabolic acidosis to nonspecific symptoms such as developmental delay, failure to thrive, hemiparesis, muscular hypotonia, and multiple sclerosis. Implementation of newborn screening for MCCD has resulted in broadening the range of phenotypic expression to include asymptomatic adults. The purpose of this study was to identify factors underlying the varying phenotypes of MCCD.MethodsWe performed exome sequencing on DNA from 33 cases and 108 healthy controls. We examined these data for associations between either MCC mutational status, genetic ancestry, or consanguinity and the absence or presence/specificity of clinical symptoms in MCCD cases.ResultsWe determined that individuals with nonspecific clinical phenotypes are highly inbred compared with cases that are asymptomatic and healthy controls. For 5 of these 10 individuals, we discovered a homozygous damaging mutation in a disease gene that is likely to underlie their nonspecific clinical phenotypes previously attributed to MCCD.ConclusionOur study shows that nonspecific phenotypes attributed to MCCD are associated with consanguinity and are likely not due to mutations in the MCC enzyme but result from rare homozygous mutations in other disease genes.Genet Med 17 8, 660-667

    Elevated serum biotinidase activity in hepatic glycogen storage disorders-A convenient biomarker

    Get PDF
    Summary: An elevated serum biotinidase activity in patients with glycogen storage disease (GSD) type Ia has been reported previously. The aim of this work was to investigate the specificity of the phenomenon and thus we expanded the study to other types of hepatic GSDs. Serum biotinidase activity was measured in a total of 68 GSD patients and was compared with that of healthy controls (8.7 ±10; range 7.0-10.6mU/ml; n=6). We found an increased biotinidase activity in patients with GSD Ia (17.7 ±3.9; range: 11.4-24.8; n=21), GSD I non-a (20.9 ±5.6; range 14.6-26.0; n=4), GSD III (12.5 ±-3.6; range 7.8-19.1; n=3), GSD VI (15.4 ±-2.0; range 14.1-17.7; n=) and GSD IX (14.0 ±-3.8; range: 7.5-21.6; n=22). The sensitivity of this test was 100% for patients with GSD Ia, GSD I non-a and GSD VI, 62% for GSD III, and 77% for GSD IX, indicating reduced sensitivity for GSD III and GSD IX, respectively. In addition, we found elevated biotinidase activity in all sera from 5 patients with Fanconi-Bickel Syndrome (15.3 ±-3.7; range 11.0-19.4). Taken together, we propose serum biotinidase as a diagnostic biomarker for hepatic glycogen storage disorder

    Intelligent Self-Repairable Web Wrappers

    Get PDF
    The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources -- the so called Web wrappers -- which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.\u

    Independent Assessment of the 2018–19 fish deaths in the lower Darling

    Full text link
    Three significant fish death events occurred in the Darling River near Menindee between December 2018 and January 2019. The three events took place within two adjacent weir pools in a 30 km reach of river between Texas Downs Station and Weir 32 (DPI NSW Fisheries, 2019). The main native fish species involved included Murray Cod, Silver Perch, Golden Perch, Bony Herring, with mortality estimates in the range of hundreds of thousands to over a million fish. Though post-event fish population sampling is yet to be conducted, we expect that these mortalities will impact populations in the lower Darling River, and perhaps beyond, for many years. These events constitute a serious ecological shock to the lower Darling and reverse positive ecological outcomes that had accrued from environmental watering programs. We have determined that fish deaths events were primarily caused by local hydrological and climatic conditions (Figure 1-1). The extreme hot and dry climate during 2018, extending into 2019, shaped the conditions that saw a large fish biomass, which had flourished since favourable spawning conditions in 2016, isolated in the weir pools around Menindee, with no means of escaping upstream or downstream. Those adverse climate conditions also shaped the subsequent algal bloom development and the strong and persistent thermal stratification of the weir pools, which created hypoxic conditions in the bottom waters of the pools. All that was needed for this to have a fatal impact on the fish was a trigger for the weir pool waters to become destratified and deprive the fish of oxygen. That trigger duly arrived with a series of sudden cool changes in the weather, with temperature drops and wind action initiating the turnover of the weir pool waters. This sudden depletion of oxygen, combined with the already high water and air temperatures, would have offered the large biomass of stressed fish no means of escape. For each fish death event, the weir pool in which the fish were trapped was bordered downstream by an impenetrable barrier (a weir) and upstream by a dry channel. Ultimately, it was the rapid transition from very favourable conditions to very adverse ones that resulted in such high numbers of fish deaths. We have also determined that the fish death events were shaped by a broader climatic, hydrologic and basin management context that placed the lower Darling River at risk of such fish deaths. The preceding six years (since 2012) had seen two high flow events that had delivered water into Menindee Lakes (2012 and 2016) and offered opportunities for substantial fish breeding and recruitment. Fish populations were further enhanced by the judicious use of environmental water. The end result was a considerable biomass of fish within the Menindee Lakes, post 2016. Outside of these high flow events there were minimal flows in the Darling River below Bourke. This period was preceded by the Millennium drought (2000-2010), during which time flows across the entire northern Murray– Darling Basin were reduced. All of the hydroclimatic evidence available indicates that the years since 2000 have been some of the driest on record, in terms of inflows into major upstream storages, combined with an increased number of extreme heat days, which would have had a major impact on water quality in remnant pools. Soon after the events, Basin government officials met and developed an action plan to respond to the crisis. Immediate actions underway include additional water quality monitoring in the lower Darling, the use of aerators and targeted fish relocations. These immediate actions are welcomed, however, the current situation remains critical – without significant inflows, further deaths of surviving fish may be expected. We consider that priorities and actions in the short-term should focus on anticipating a repeat of ‘worst-case scenario’ outcomes with responses focussed at the site scale. In addition, the Minister for Agriculture and Water Resources announced a Native Fish Management and Recovery Strategy to help manage and recover fish populations across the Basin. We consider that this provides a good opportunity to enhance native fish management and support native fish population recovery and should be developed and implemented through a genuine collaboration between governments, communities, and Traditional Owners. The strategy needs to build on existing and lapsed native fish programs across the Basin. Through our investigations, it became evident to us that, over the long-term, the extant water access arrangements in the northern Basin, as well as limitations in the river models used to plan water sharing, place the lower Darling River at a higher risk of conditions that can lead to fish deaths during droughts than has previously been anticipated. Given that we are witnessing an increasing frequency of low inflow sequences in the northern Basin, this presents a serious problem for safeguarding fish populations, and populations of other resident biota, during drought in the lower Darling. We have identified that changes to Barwon–Darling water access arrangements made by NSW just prior to the commencement of the Basin Plan in 2012 have enhanced the ability of irrigators to access water during low flow periods and during the first flow event immediately after a cease-to-flow period. Further, it appears that the river models used to develop water sharing arrangements have a tendency to overestimate streamflows during dry sequences, and hence underestimate the impacts of extractions during dry times

    Intermediate Element Abundances in Galaxy Clusters

    Full text link
    We present the average abundances of the intermediate elements obtained by performing a stacked analysis of all the galaxy clusters in the archive of the X-ray telescope ASCA. We determine the abundances of Fe, Si, S, and Ni as a function of cluster temperature (mass) from 1--10 keV, and place strong upper limits on the abundances of Ca and Ar. In general, Si and Ni are overabundant with respect to Fe, while Ar and Ca are very underabundant. The discrepancy between the abundances of Si, S, Ar, and Ca indicate that the alpha-elements do not behave homogeneously as a single group. We show that the abundances of the most well-determined elements Fe, Si, and S in conjunction with recent theoretical supernovae yields do not give a consistent solution for the fraction of material produced by Type Ia and Type II supernovae at any temperature or mass. The general trend is for higher temperature clusters to have more of their metals produced in Type II supernovae than in Type Ias. The inconsistency of our results with abundances in the Milky Way indicate that spiral galaxies are not the dominant metal contributors to the intracluster medium (ICM). The pattern of elemental abundances requires an additional source of metals beyond standard SNIa and SNII enrichment. The properties of this new source are well matched to those of Type II supernovae with very massive, metal-poor progenitor stars. These results are consistent with a significant fraction of the ICM metals produced by an early generation of population III stars.Comment: 18 pages, 11 figures, 7 tables. Submitted to Ap

    Demonstration of Cross-Reactive Antibodies to Smooth Gram-Negative Bacteria in Antiserum to Escherichia coli J5

    Get PDF
    We investigated the discrepancy between the broad cross-protection against gram-negative infections afforded by antiserum to Escherichia coli J5 and its apparently narrow cross-reactivity in vitro. Rabbits immunized with J5 bacteria produced antibodies to both the J5 lipopolysaccharide (LPS; titer by ELISA, 1:60,000) and LPS from the Re mutant of Salmonella minnesota (i.e., to the ketodeoxyoctonate [KDO] and lipid A determinants; titer, 1:3,200). In highly diluted antiserum, titers of antibody to J5 LPS were reduced by 28%-41% after adsorption with seven strains of smooth gram-negative bacteria and by only 4% after adsorption with the Re mutant. Smooth gram-negative bacteria adsorbed virtually all antibody to Re LPS. Therefore, rabbit antiserum to J5 contains type-specific antibodies to core determinants distal to KDO that can obscure highly cross-reactive antibodies to lipid A-KDO in vitro. Cross-reactive antibodies are demonstrable by adsorption with whole bacteria at limiting concentrations of antibod
    • …
    corecore