24,726 research outputs found
Query-Driven Sampling for Collective Entity Resolution
Probabilistic databases play a preeminent role in the processing and
management of uncertain data. Recently, many database research efforts have
integrated probabilistic models into databases to support tasks such as
information extraction and labeling. Many of these efforts are based on batch
oriented inference which inhibits a realtime workflow. One important task is
entity resolution (ER). ER is the process of determining records (mentions) in
a database that correspond to the same real-world entity. Traditional pairwise
ER methods can lead to inconsistencies and low accuracy due to localized
decisions. Leading ER systems solve this problem by collectively resolving all
records using a probabilistic graphical model and Markov chain Monte Carlo
(MCMC) inference. However, for large datasets this is an extremely expensive
process. One key observation is that, such exhaustive ER process incurs a huge
up-front cost, which is wasteful in practice because most users are interested
in only a small subset of entities. In this paper, we advocate pay-as-you-go
entity resolution by developing a number of query-driven collective ER
techniques. We introduce two classes of SQL queries that involve ER operators
--- selection-driven ER and join-driven ER. We implement novel variations of
the MCMC Metropolis Hastings algorithm to generate biased samples and
selectivity-based scheduling algorithms to support the two classes of ER
queries. Finally, we show that query-driven ER algorithms can converge and
return results within minutes over a database populated with the extraction
from a newswire dataset containing 71 million mentions
Strong Solutions of Stochastic Generalized Porous Media Equations: Existence, Uniqueness and Ergodicity
Explicit conditions are presented for the existence, uniqueness and
ergodicity of the strong solution to a class of generalized stochastic porous
media equations. Our estimate of the convergence rate is sharp according to the
known optimal decay for the solution of the classical (deterministic) porous
medium equation.Comment: 15 pages; BiBoS-Preprint No. 04-09-157; to appear in Commun. PD
Limits of the equivalence of time and ensemble averages in shear flows
In equilibrium systems, time and ensemble averages of physical quantities are
equivalent due to ergodic exploration of phase space. In driven systems, it is
unknown if a similar equivalence of time and ensemble averages exists. We
explore effective limits of such convergence in a sheared bubble raft using
averages of the bubble velocities. In independent experiments, averaging over
time leads to well converged velocity profiles. However, the time-averages from
independent experiments result in distinct velocity averages. Ensemble averages
are approximated by randomly selecting bubble velocities from independent
experiments. Increasingly better approximations of ensemble averages converge
toward a unique velocity profile. Therefore, the experiments establish that in
practical realizations of non-equilibrium systems, temporal averaging and
ensemble averaging can yield convergent (stationary) but distinct
distributions.Comment: accepted to PRL - final figure revision
Rex1p Deficiency Leads to Accumulation of Precursor Initiator tRNA\u3csup\u3eMet\u3c/sup\u3e and Polyadenylation of Substrate RNAs in \u3cem\u3eSaccharomyces cerevisiae\u3c/em\u3e
A synthetic genetic array was used to identify lethal and slow-growth phenotypes produced when a mutation in TRM6, which encodes a tRNA modification enzyme subunit, was combined with the deletion of any non-essential gene in Saccharomyces cerevisiae. We found that deletion of the REX1 gene resulted in a slow-growth phenotype in the trm6-504 strain. Previously, REX1 was shown to be involved in processing the 3′ ends of 5S rRNA and the dimeric tRNAArg-tRNAAsp. In this study, we have discovered a requirement for Rex1p in processing the 3′ end of tRNAiMet precursors and show that precursor tRNAiMet accumulates in a trm6-504 rex1Δ strain. Loss of Rex1p results in polyadenylation of its substrates, including tRNAiMet, suggesting that defects in 3′ end processing can activate the nuclear surveillance pathway. Finally, purified Rex1p displays Mg2+-dependent ribonuclease activity in vitro, and the enzyme is inactivated by mutation of two highly conserved amino acids
- …