3,326 research outputs found

    Wrapper Maintenance: A Machine Learning Approach

    Full text link
    The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task

    Latent Space Model for Multi-Modal Social Data

    Full text link
    With the emergence of social networking services, researchers enjoy the increasing availability of large-scale heterogenous datasets capturing online user interactions and behaviors. Traditional analysis of techno-social systems data has focused mainly on describing either the dynamics of social interactions, or the attributes and behaviors of the users. However, overwhelming empirical evidence suggests that the two dimensions affect one another, and therefore they should be jointly modeled and analyzed in a multi-modal framework. The benefits of such an approach include the ability to build better predictive models, leveraging social network information as well as user behavioral signals. To this purpose, here we propose the Constrained Latent Space Model (CLSM), a generalized framework that combines Mixed Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA) incorporating a constraint that forces the latent space to concurrently describe the multiple data modalities. We derive an efficient inference algorithm based on Variational Expectation Maximization that has a computational cost linear in the size of the network, thus making it feasible to analyze massive social datasets. We validate the proposed framework on two problems: prediction of social interactions from user attributes and behaviors, and behavior prediction exploiting network information. We perform experiments with a variety of multi-modal social systems, spanning location-based social networks (Gowalla), social media services (Instagram, Orkut), e-commerce and review sites (Amazon, Ciao), and finally citation networks (Cora). The results indicate significant improvement in prediction accuracy over state of the art methods, and demonstrate the flexibility of the proposed approach for addressing a variety of different learning problems commonly occurring with multi-modal social data.Comment: 12 pages, 7 figures, 2 table

    Intelligent Self-Repairable Web Wrappers

    Get PDF
    The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources -- the so called Web wrappers -- which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.\u

    Interaction Between Autonomic Tone and the Negative Chronotropic Effect of Adenosine in Humans

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/72287/1/j.1540-8159.1999.tb00412.x.pd

    Coronary artery endothelial dysfunction is positively correlated with low density lipoprotein and inversely correlated with high density lipoprotein subclass particles measured by nuclear magnetic resonance spectroscopy.

    Get PDF
    OBJECTIVE: The association between cholesterol and endothelial dysfunction remains controversial. We tested the hypothesis that lipoprotein subclasses are associated with coronary endothelial dysfunction. METHODS AND RESULTS: Coronary endothelial function was assessed in 490 patients between November 1993 and February 2007. Fasting lipids and nuclear magnetic resonance (NMR) lipoprotein particle subclasses were measured. There were 325 females and 165 males with a mean age of 49.8+/-11.6 years. Coronary endothelial dysfunction (epicardial constriction>20% or increase in coronary blood flow<50% in response to intracoronary acetylcholine) was diagnosed in 273 patients, the majority of whom (64.5%) had microvascular dysfunction. Total cholesterol and LDL-C (low density lipoprotein cholesterol) were not associated with endothelial dysfunction. One-way analysis and multivariate methods adjusting for age, gender, diabetes, hypertension and lipid-lowering agent use were used to determine the correlation between lipoprotein subclasses and coronary endothelial dysfunction. Epicardial endothelial dysfunction was significantly correlated with total (p=0.03) and small LDLp (LDL particles) (p<0.01) and inversely correlated with total and large HDLp (high density lipoprotein particles) (p<0.01). CONCLUSIONS: Epicardial, but not microvascular, coronary endothelial dysfunction was associated directly with LDL particles and inversely with HDL particles, suggesting location-dependent impact of lipoprotein particles on the coronary circulation

    On Non-Abelian Symplectic Cutting

    Full text link
    We discuss symplectic cutting for Hamiltonian actions of non-Abelian compact groups. By using a degeneration based on the Vinberg monoid we give, in good cases, a global quotient description of a surgery construction introduced by Woodward and Meinrenken, and show it can be interpreted in algebro-geometric terms. A key ingredient is the `universal cut' of the cotangent bundle of the group itself, which is identified with a moduli space of framed bundles on chains of projective lines recently introduced by the authors.Comment: Various edits made, to appear in Transformation Groups. 28 pages, 8 figure

    A randomised controlled trial of a psychoeducational intervention for women at increased risk of breast cancer

    Get PDF
    This study aimed to compare the impact of two versions of a psychoeducational written intervention on cancer worry and objective knowledge of breast cancer risk-related topics in women who had been living with an increased risk of familial breast cancer for several years. Participants were randomised to three conditions: scientific and psychosocial information pack (Group 1), scientific information pack only (Group 2) or standard care control (Group 3). They completed postal questionnaires at baseline (n¼163) and\ud 4 weeks (n¼151). As predicted, there was a significant decrease in cancer worry for Group 1, but not Group 2. Objective\ud knowledge significantly improved for both Group 1 and Group 2 as expected, but not Group 3. However, there was an unpredicted\ud decline in cancer worry for Group 3. This study supports the value of a scientific and psychosocial information pack in providing up-to-date information related to familial risk of breast cancer for long-term attendees of a familial breast cancer clinic. Further research is warranted to determine how the information pack could be incorporated into the existing clinical service, thus providing these\ud women with the type of ongoing psychosocial support that many familial breast cancer clinics are currently lacking

    Zonotopes and four-dimensional superconformal field theories

    Get PDF
    The a-maximization technique proposed by Intriligator and Wecht allows us to determine the exact R-charges and scaling dimensions of the chiral operators of four-dimensional superconformal field theories. The problem of existence and uniqueness of the solution, however, has not been addressed in general setting. In this paper, it is shown that the a-function has always a unique critical point which is also a global maximum for a large class of quiver gauge theories specified by toric diagrams. Our proof is based on the observation that the a-function is given by the volume of a three dimensional polytope called "zonotope", and the uniqueness essentially follows from Brunn-Minkowski inequality for the volume of convex bodies. We also show a universal upper bound for the exact R-charges, and the monotonicity of a-function in the sense that a-function decreases whenever the toric diagram shrinks. The relationship between a-maximization and volume-minimization is also discussed.Comment: 29 pages, 15 figures, reference added, typos corrected, version published in JHE

    Low prevalence, quasi-stationarity and power-law distribution in a model of spreading

    Full text link
    Understanding how contagions (information, infections, etc) are spread on complex networks is important both from practical as well as theoretical point of view. Considerable work has been done in this regard in the past decade or so. However, most models are limited in their scope and as a result only capture general features of spreading phenomena. Here, we propose and study a model of spreading which takes into account the strength or quality of contagions as well as the local (probabilistic) dynamics occurring at various nodes. Transmission occurs only after the quality-based fitness of the contagion has been evaluated by the local agent. The model exhibits quality-dependent exponential time scales at early times leading to a slowly evolving quasi-stationary state. Low prevalence is seen for a wide range of contagion quality for arbitrary large networks. We also investigate the activity of nodes and find a power-law distribution with a robust exponent independent of network topology. Our results are consistent with recent empirical observations.Comment: 7 pages, 8 figures. (Submitted

    L^{2}-restriction bounds for eigenfunctions along curves in the quantum completely integrable case

    Full text link
    We show that for a quantum completely integrable system in two dimensions,the L2L^{2}-normalized joint eigenfunctions of the commuting semiclassical pseudodifferential operators satisfy restriction bounds ofthe form γϕj2ds=O(log) \int_{\gamma} |\phi_{j}^{\hbar}|^2 ds = {\mathcal O}(|\log \hbar|) for generic curves γ\gamma on the surface. We also prove that the maximal restriction bounds of Burq-Gerard-Tzvetkov are always attained for certain exceptional subsequences of eigenfunctions.Comment: Correct some typos and added some more detail in section
    corecore