3,326 research outputs found
Wrapper Maintenance: A Machine Learning Approach
The proliferation of online information sources has led to an increased use
of wrappers for extracting data from Web sources. While most of the previous
research has focused on quick and efficient generation of wrappers, the
development of tools for wrapper maintenance has received less attention. This
is an important research problem because Web sources often change in ways that
prevent the wrappers from extracting data correctly. We present an efficient
algorithm that learns structural information about data from positive examples
alone. We describe how this information can be used for two wrapper maintenance
applications: wrapper verification and reinduction. The wrapper verification
system detects when a wrapper is not extracting correct data, usually because
the Web source has changed its format. The reinduction algorithm automatically
recovers from changes in the Web source by identifying data on Web pages so
that a new wrapper may be generated for this source. To validate our approach,
we monitored 27 wrappers over a period of a year. The verification algorithm
correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes,
resulting in precision of 0.73 and recall of 0.95. We validated the reinduction
algorithm on ten Web sources. We were able to successfully reinduce the
wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data
extraction task
Latent Space Model for Multi-Modal Social Data
With the emergence of social networking services, researchers enjoy the
increasing availability of large-scale heterogenous datasets capturing online
user interactions and behaviors. Traditional analysis of techno-social systems
data has focused mainly on describing either the dynamics of social
interactions, or the attributes and behaviors of the users. However,
overwhelming empirical evidence suggests that the two dimensions affect one
another, and therefore they should be jointly modeled and analyzed in a
multi-modal framework. The benefits of such an approach include the ability to
build better predictive models, leveraging social network information as well
as user behavioral signals. To this purpose, here we propose the Constrained
Latent Space Model (CLSM), a generalized framework that combines Mixed
Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA)
incorporating a constraint that forces the latent space to concurrently
describe the multiple data modalities. We derive an efficient inference
algorithm based on Variational Expectation Maximization that has a
computational cost linear in the size of the network, thus making it feasible
to analyze massive social datasets. We validate the proposed framework on two
problems: prediction of social interactions from user attributes and behaviors,
and behavior prediction exploiting network information. We perform experiments
with a variety of multi-modal social systems, spanning location-based social
networks (Gowalla), social media services (Instagram, Orkut), e-commerce and
review sites (Amazon, Ciao), and finally citation networks (Cora). The results
indicate significant improvement in prediction accuracy over state of the art
methods, and demonstrate the flexibility of the proposed approach for
addressing a variety of different learning problems commonly occurring with
multi-modal social data.Comment: 12 pages, 7 figures, 2 table
Intelligent Self-Repairable Web Wrappers
The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources -- the so called Web wrappers -- which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.\u
Interaction Between Autonomic Tone and the Negative Chronotropic Effect of Adenosine in Humans
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/72287/1/j.1540-8159.1999.tb00412.x.pd
Coronary artery endothelial dysfunction is positively correlated with low density lipoprotein and inversely correlated with high density lipoprotein subclass particles measured by nuclear magnetic resonance spectroscopy.
OBJECTIVE: The association between cholesterol and endothelial dysfunction remains controversial. We tested the hypothesis that lipoprotein subclasses are associated with coronary endothelial dysfunction.
METHODS AND RESULTS: Coronary endothelial function was assessed in 490 patients between November 1993 and February 2007. Fasting lipids and nuclear magnetic resonance (NMR) lipoprotein particle subclasses were measured. There were 325 females and 165 males with a mean age of 49.8+/-11.6 years. Coronary endothelial dysfunction (epicardial constriction>20% or increase in coronary blood flow<50% in response to intracoronary acetylcholine) was diagnosed in 273 patients, the majority of whom (64.5%) had microvascular dysfunction. Total cholesterol and LDL-C (low density lipoprotein cholesterol) were not associated with endothelial dysfunction. One-way analysis and multivariate methods adjusting for age, gender, diabetes, hypertension and lipid-lowering agent use were used to determine the correlation between lipoprotein subclasses and coronary endothelial dysfunction. Epicardial endothelial dysfunction was significantly correlated with total (p=0.03) and small LDLp (LDL particles) (p<0.01) and inversely correlated with total and large HDLp (high density lipoprotein particles) (p<0.01).
CONCLUSIONS: Epicardial, but not microvascular, coronary endothelial dysfunction was associated directly with LDL particles and inversely with HDL particles, suggesting location-dependent impact of lipoprotein particles on the coronary circulation
On Non-Abelian Symplectic Cutting
We discuss symplectic cutting for Hamiltonian actions of non-Abelian compact
groups. By using a degeneration based on the Vinberg monoid we give, in good
cases, a global quotient description of a surgery construction introduced by
Woodward and Meinrenken, and show it can be interpreted in algebro-geometric
terms. A key ingredient is the `universal cut' of the cotangent bundle of the
group itself, which is identified with a moduli space of framed bundles on
chains of projective lines recently introduced by the authors.Comment: Various edits made, to appear in Transformation Groups. 28 pages, 8
figure
A randomised controlled trial of a psychoeducational intervention for women at increased risk of breast cancer
This study aimed to compare the impact of two versions of a psychoeducational written intervention on cancer worry and objective knowledge of breast cancer risk-related topics in women who had been living with an increased risk of familial breast cancer for several years. Participants were randomised to three conditions: scientific and psychosocial information pack (Group 1), scientific information pack only (Group 2) or standard care control (Group 3). They completed postal questionnaires at baseline (n¼163) and\ud
4 weeks (n¼151). As predicted, there was a significant decrease in cancer worry for Group 1, but not Group 2. Objective\ud
knowledge significantly improved for both Group 1 and Group 2 as expected, but not Group 3. However, there was an unpredicted\ud
decline in cancer worry for Group 3. This study supports the value of a scientific and psychosocial information pack in providing up-to-date information related to familial risk of breast cancer for long-term attendees of a familial breast cancer clinic. Further research is warranted to determine how the information pack could be incorporated into the existing clinical service, thus providing these\ud
women with the type of ongoing psychosocial support that many familial breast cancer clinics are currently lacking
Zonotopes and four-dimensional superconformal field theories
The a-maximization technique proposed by Intriligator and Wecht allows us to
determine the exact R-charges and scaling dimensions of the chiral operators of
four-dimensional superconformal field theories. The problem of existence and
uniqueness of the solution, however, has not been addressed in general setting.
In this paper, it is shown that the a-function has always a unique critical
point which is also a global maximum for a large class of quiver gauge theories
specified by toric diagrams. Our proof is based on the observation that the
a-function is given by the volume of a three dimensional polytope called
"zonotope", and the uniqueness essentially follows from Brunn-Minkowski
inequality for the volume of convex bodies. We also show a universal upper
bound for the exact R-charges, and the monotonicity of a-function in the sense
that a-function decreases whenever the toric diagram shrinks. The relationship
between a-maximization and volume-minimization is also discussed.Comment: 29 pages, 15 figures, reference added, typos corrected, version
published in JHE
Low prevalence, quasi-stationarity and power-law distribution in a model of spreading
Understanding how contagions (information, infections, etc) are spread on
complex networks is important both from practical as well as theoretical point
of view. Considerable work has been done in this regard in the past decade or
so. However, most models are limited in their scope and as a result only
capture general features of spreading phenomena. Here, we propose and study a
model of spreading which takes into account the strength or quality of
contagions as well as the local (probabilistic) dynamics occurring at various
nodes. Transmission occurs only after the quality-based fitness of the
contagion has been evaluated by the local agent. The model exhibits
quality-dependent exponential time scales at early times leading to a slowly
evolving quasi-stationary state. Low prevalence is seen for a wide range of
contagion quality for arbitrary large networks. We also investigate the
activity of nodes and find a power-law distribution with a robust exponent
independent of network topology. Our results are consistent with recent
empirical observations.Comment: 7 pages, 8 figures. (Submitted
L^{2}-restriction bounds for eigenfunctions along curves in the quantum completely integrable case
We show that for a quantum completely integrable system in two dimensions,the
-normalized joint eigenfunctions of the commuting semiclassical
pseudodifferential operators satisfy restriction bounds ofthe form for generic
curves on the surface. We also prove that the maximal restriction
bounds of Burq-Gerard-Tzvetkov are always attained for certain exceptional
subsequences of eigenfunctions.Comment: Correct some typos and added some more detail in section
- …