Search CORE

4,947 research outputs found

Wrapper Maintenance: A Machine Learning Approach

Author: Knoblock C. A.
Lerman K.
Minton S. N.
Publication venue: 'AI Access Foundation'
Publication date: 23/06/2011
Field of study

The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task

arXiv.org e-Print Archive

Crossref

Equivariant differential characters and symplectic reduction

Author: A. Weinstein
Anton Malkin
E. Lerman
Eugene Lerman
J. Marsden
K. Gomi
M. Hopkins
R. Bos
V. Guillemin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

We describe equivariant differential characters (classifying equivariant circle bundles with connections), their prequantization, and reduction

arXiv.org e-Print Archive

CiteSeerX

Crossref

Robust computation of linear models by convex relaxation

Author: Lerman Gilad
McCoy Michael
Tropp Joel A.
Zhang Teng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/08/2014
Field of study

Consider a dataset of vector-valued observations that consists of noisy inliers, which are explained well by a low-dimensional subspace, along with some number of outliers. This work describes a convex optimization problem, called REAPER, that can reliably fit a low-dimensional model to this type of data. This approach parameterizes linear subspaces using orthogonal projectors, and it uses a relaxation of the set of orthogonal projectors to reach the convex formulation. The paper provides an efficient algorithm for solving the REAPER problem, and it documents numerical experiments which confirm that REAPER can dependably find linear structure in synthetic and natural data. In addition, when the inliers lie near a low-dimensional subspace, there is a rigorous theory that describes when REAPER can approximate this subspace.Comment: Formerly titled "Robust computation of linear models, or How to find a needle in a haystack

arXiv.org e-Print Archive

Crossref

Caltech Authors

VIP: Incorporating Human Cognitive Biases in a Probabilistic Model of Retweeting

Author: A Chaudhry
BA Huberman
K Lerman
NJ Blunch
R Salakhutdinov
Publication venue
Publication date: 02/02/2015
Field of study

Information spread in social media depends on a number of factors, including how the site displays information, how users navigate it to find items of interest, users' tastes, and the `virality' of information, i.e., its propensity to be adopted, or retweeted, upon exposure. Probabilistic models can learn users' tastes from the history of their item adoptions and recommend new items to users. However, current models ignore cognitive biases that are known to affect behavior. Specifically, people pay more attention to items at the top of a list than those in lower positions. As a consequence, items near the top of a user's social media stream have higher visibility, and are more likely to be seen and adopted, than those appearing below. Another bias is due to the item's fitness: some items have a high propensity to spread upon exposure regardless of the interests of adopting users. We propose a probabilistic model that incorporates human cognitive biases and personal relevance in the generative model of information spread. We use the model to predict how messages containing URLs spread on Twitter. Our work shows that models of user behavior that account for cognitive factors can better describe and predict user behavior in social media.Comment: SBP 201

arXiv.org e-Print Archive

Crossref

Variational Data Assimilation via Sparse Regularization

Author: Ebtehaj A. M.
Foufoula-Georgiou E.
Lerman G.
Zupanski M.
Publication venue: 'Co-Action Publishing'
Publication date: 01/01/2013
Field of study

This paper studies the role of sparse regularization in a properly chosen basis for variational data assimilation (VDA) problems. Specifically, it focuses on data assimilation of noisy and down-sampled observations while the state variable of interest exhibits sparsity in the real or transformed domain. We show that in the presence of sparsity, the

\ell_{1}

-norm regularization produces more accurate and stable solutions than the classic data assimilation methods. To motivate further developments of the proposed methodology, assimilation experiments are conducted in the wavelet and spectral domain using the linear advection-diffusion equation

arXiv.org e-Print Archive

Directory of Open Access Journals

DigitalCommons@USU

Narrative Health Communication and Behavior Change: The Influence of Exemplars in the News on Intention to Quit Smoking.

Author: Bigman Cabral A.
Cappella Joseph N.
Kim Hyun Suk
Leader Amy E.
Lerman Caryn
Publication venue: Jefferson Digital Commons
Publication date: 01/06/2012
Field of study

This study investigated psychological mechanisms underlying the effect of narrative health communication on behavioral intention. Specifically, the study examined how exemplification in news about successful smoking cessation affects recipients\u27 narrative engagement, thereby changing their intention to quit smoking. Nationally representative samples of U.S. adult smokers participated in 2 experiments. The results from the 2 experiments consistently showed that smokers reading a news article with an exemplar experienced greater narrative engagement compared to those reading an article without an exemplar. Those who reported more engagement were in turn more likely to report greater smoking cessation intentions

PubMed Central

Jefferson Digital Commons

Mining social semantics on the social web

Author: Hotho A.
Jäschke R.
Lerman K.
Publication venue: 'IOS Press'
Publication date: 06/04/2017
Field of study

Crossref

White Rose Research Online

Why Do Cascade Sizes Follow a Power-Law?

Author: Bayley N.
Brach P.
Erdös P.
Gaba A.
Lerman K.
Rogers E. M.
Steeg G. V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/02/2017
Field of study

We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is consistent with multiple empirical analysis of the large social networks. We compared the assumptions of our model with the Twitter social network and tested the goodness of approximation.Comment: 8 pages, 7 figures, accepted to WWW 201

arXiv.org e-Print Archive

Crossref

Mining social semantics on the social web

Author: Hotho A.
Jäschke R.
Lerman K.
Publication venue: IOS Press
Publication date: 24/02/2017
Field of study

Biblioteca Digital de la Comunidad de Madrid

White Rose Research Online