4,446 research outputs found
Self-Supervised and Controlled Multi-Document Opinion Summarization
We address the problem of unsupervised abstractive summarization of
collections of user generated reviews with self-supervision and control. We
propose a self-supervised setup that considers an individual document as a
target summary for a set of similar documents. This setting makes training
simpler than previous approaches by relying only on standard log-likelihood
loss. We address the problem of hallucinations through the use of control
codes, to steer the generation towards more coherent and relevant
summaries.Finally, we extend the Transformer architecture to allow for multiple
reviews as input. Our benchmarks on two datasets against graph-based and recent
neural abstractive unsupervised models show that our proposed method generates
summaries with a superior quality and relevance.This is confirmed in our human
evaluation which focuses explicitly on the faithfulness of generated summaries
We also provide an ablation study, which shows the importance of the control
setup in controlling hallucinations and achieve high sentiment and topic
alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi
Toxic Cyanobacteria Aerosols: Tests of Filters for Cells
Aerosolization of toxic cyanobacteria released from the surface of lakes is a new area of study that could uncover a previously unknown route of exposure to toxic cyanobacteria. Since toxic cyanobacteria may be responsible for adverse human health effects, methods and equipment need to be tested and established for monitoring these airborne bacteria. The primary focus of this study was to create controlled laboratory experiments that simulate natural lake aerosol production. I set out to test for the best type of filter to collect and analyze the aerosolized cells as small as 0.2-2.0 µm, known as picoplankton. To collect these aerosols, air was vacuumed from just above a sample of lake water passing through either glass fiber filters (GFF) or 0.22 µm MF-Millipore™ membrane filters (0.22 Millipore™). Filter collections were analyzed through epiflourescence microscopy for determining cell counts. Data analysis revealed that 0.22 Millipore™ filters were the best option for cell enumeration providing better epiflourescence optical quality and higher cell counts
Lazier Than Lazy Greedy
Is it possible to maximize a monotone submodular function faster than the
widely used lazy greedy algorithm (also known as accelerated greedy), both in
theory and practice? In this paper, we develop the first linear-time algorithm
for maximizing a general monotone submodular function subject to a cardinality
constraint. We show that our randomized algorithm, STOCHASTIC-GREEDY, can
achieve a approximation guarantee, in expectation, to the
optimum solution in time linear in the size of the data and independent of the
cardinality constraint. We empirically demonstrate the effectiveness of our
algorithm on submodular functions arising in data summarization, including
training large-scale kernel methods, exemplar-based clustering, and sensor
placement. We observe that STOCHASTIC-GREEDY practically achieves the same
utility value as lazy greedy but runs much faster. More surprisingly, we
observe that in many practical scenarios STOCHASTIC-GREEDY does not evaluate
the whole fraction of data points even once and still achieves
indistinguishable results compared to lazy greedy.Comment: In Proc. Conference on Artificial Intelligence (AAAI), 201
- …