4,446 research outputs found

    Self-Supervised and Controlled Multi-Document Opinion Summarization

    Full text link
    We address the problem of unsupervised abstractive summarization of collections of user generated reviews with self-supervision and control. We propose a self-supervised setup that considers an individual document as a target summary for a set of similar documents. This setting makes training simpler than previous approaches by relying only on standard log-likelihood loss. We address the problem of hallucinations through the use of control codes, to steer the generation towards more coherent and relevant summaries.Finally, we extend the Transformer architecture to allow for multiple reviews as input. Our benchmarks on two datasets against graph-based and recent neural abstractive unsupervised models show that our proposed method generates summaries with a superior quality and relevance.This is confirmed in our human evaluation which focuses explicitly on the faithfulness of generated summaries We also provide an ablation study, which shows the importance of the control setup in controlling hallucinations and achieve high sentiment and topic alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi

    Toxic Cyanobacteria Aerosols: Tests of Filters for Cells

    Get PDF
    Aerosolization of toxic cyanobacteria released from the surface of lakes is a new area of study that could uncover a previously unknown route of exposure to toxic cyanobacteria. Since toxic cyanobacteria may be responsible for adverse human health effects, methods and equipment need to be tested and established for monitoring these airborne bacteria. The primary focus of this study was to create controlled laboratory experiments that simulate natural lake aerosol production. I set out to test for the best type of filter to collect and analyze the aerosolized cells as small as 0.2-2.0 µm, known as picoplankton. To collect these aerosols, air was vacuumed from just above a sample of lake water passing through either glass fiber filters (GFF) or 0.22 µm MF-Millipore™ membrane filters (0.22 Millipore™). Filter collections were analyzed through epiflourescence microscopy for determining cell counts. Data analysis revealed that 0.22 Millipore™ filters were the best option for cell enumeration providing better epiflourescence optical quality and higher cell counts

    Lazier Than Lazy Greedy

    Full text link
    Is it possible to maximize a monotone submodular function faster than the widely used lazy greedy algorithm (also known as accelerated greedy), both in theory and practice? In this paper, we develop the first linear-time algorithm for maximizing a general monotone submodular function subject to a cardinality constraint. We show that our randomized algorithm, STOCHASTIC-GREEDY, can achieve a (1−1/e−ε)(1-1/e-\varepsilon) approximation guarantee, in expectation, to the optimum solution in time linear in the size of the data and independent of the cardinality constraint. We empirically demonstrate the effectiveness of our algorithm on submodular functions arising in data summarization, including training large-scale kernel methods, exemplar-based clustering, and sensor placement. We observe that STOCHASTIC-GREEDY practically achieves the same utility value as lazy greedy but runs much faster. More surprisingly, we observe that in many practical scenarios STOCHASTIC-GREEDY does not evaluate the whole fraction of data points even once and still achieves indistinguishable results compared to lazy greedy.Comment: In Proc. Conference on Artificial Intelligence (AAAI), 201
    • …
    corecore