Search CORE

154,480 research outputs found

Improving Sequential Determinantal Point Processes for Supervised Video Summarization

Author: A Borodin
A Kulesza
A Sharghi
B Xiong
D Potapov
H Daumé
JB Hough
K Zhang
M Gygli
M Sun
R Hirsch
SEF Avila De
YJ Lee
Publication venue
Publication date: 01/01/2018
Field of study

It is now much easier than ever before to produce videos. While the ubiquitous video data is a great source for information discovery and extraction, the computational challenges are unparalleled. Automatically summarizing the videos has become a substantial need for browsing, searching, and indexing visual content. This paper is in the vein of supervised video summarization using sequential determinantal point process (SeqDPP), which models diversity by a probabilistic distribution. We improve this model in two folds. In terms of learning, we propose a large-margin algorithm to address the exposure bias problem in SeqDPP. In terms of modeling, we design a new probabilistic distribution such that, when it is integrated into SeqDPP, the resulting model accepts user input about the expected length of the summary. Moreover, we also significantly extend a popular video summarization dataset by 1) more egocentric videos, 2) dense user annotations, and 3) a refined evaluation scheme. We conduct extensive experiments on this dataset (about 60 hours of videos in total) and compare our approach to several competitive baselines

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Heterogeneous Batch Distillation Processes: Real System Optimisation

Author: B. Kotai
Bernot
Bernot
Betlem
Bonny
Bonny
Coward
Farhat
Farhat
Fernhloz
Fraga
Fraga
Frey
Furlonge
Gmehling
Hanke
Jang
Kao
Kim
Kiva
Lang
Lelkes
Modla
Mukherjee
Mutjaba
Noda
Novak
O. Baudouin
P. Floquet
P. Lang
Rodriguez-Donis
Rodriguez-Donis
Rodríguez-Donis
S. Massebeuf
S. Pommier
Skouras
Skouras
Sorensen
Spellucci
Spellucci
V. Gerbaud
Van Dongen
Publication venue: 'Elsevier BV'
Publication date: 01/03/2008
Field of study

In this paper, optimisation of batch distillation processes is considered. It deals with real systems with rigorous simulation of the processes through the resolution full MESH differential algebraic equations. Specific software architecture is developed, based on the BatchColumn® simulator and on both SQP and GA numerical algorithms, and is able to optimise sequential batch columns as long as the column transitions are set. The efficiency of the proposed optimisation tool is illustrated by two case studies. The first one concerns heterogeneous batch solvent recovery in a single distillation column and shows that significant economical gains are obtained along with improved process conditions. Case two concerns the optimisation of two sequential homogeneous batch distillation columns and demonstrates the capacity to optimize several sequential dynamic different processes. For such multiobjective complex problems, GA is preferred to SQP that is able to improve specific GA solutions

Crossref

Open Archive Toulouse Archive Ouverte

Sequence to Sequence Mixture Model for Diverse Machine Translation

Author: Haffari Gholamreza
He Xuanli
Norouzi Mohammad
Publication venue
Publication date: 01/01/2018
Field of study

Sequence to sequence (SEQ2SEQ) models often lack diversity in their generated translations. This can be attributed to the limitation of SEQ2SEQ models in capturing lexical and syntactic variations in a parallel corpus resulting from different styles, genres, topics, or ambiguity of the translation process. In this paper, we develop a novel sequence to sequence mixture (S2SMIX) model that improves both translation diversity and quality by adopting a committee of specialized translation models rather than a single translation model. Each mixture component selects its own training dataset via optimization of the marginal loglikelihood, which leads to a soft clustering of the parallel corpus. Experiments on four language pairs demonstrate the superiority of our mixture model compared to a SEQ2SEQ baseline with standard or diversity-boosted beam search. Our mixture model uses negligible additional parameters and incurs no extra computation cost during decoding.Comment: 11 pages, 5 figures, accepted to CoNLL201

arXiv.org e-Print Archive

Crossref

Monash University Research Portal

Recovering complete and draft population genomes from metagenome datasets.

Author: Gilbert Jack A
Sangwan Naseer
Xia Fangfang
Publication venue: eScholarship, University of California
Publication date: 01/03/2016
Field of study

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

Woods Hole Open Access Server

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Efficient depletion of host DNA contamination in malaria clinical sequencing.

Author: Alcock Daniel
Berriman Matthew
Gu Yong
Kwiatkowski Dominic P.
Macinnis Bronwyn
Manske Magnus
Newbold Chris I.
O'Brien John
Otto Thomas D.
Oyola Samuel O.
Quail Michael A.
Swerdlow Harold P.
Publication venue: 'American Society for Microbiology'
Publication date: 01/03/2013
Field of study

The cost of whole-genome sequencing (WGS) is decreasing rapidly as next-generation sequencing technology continues to advance, and the prospect of making WGS available for public health applications is becoming a reality. So far, a number of studies have demonstrated the use of WGS as an epidemiological tool for typing and controlling outbreaks of microbial pathogens. Success of these applications is hugely dependent on efficient generation of clean genetic material that is free from host DNA contamination for rapid preparation of sequencing libraries. The presence of large amounts of host DNA severely affects the efficiency of characterizing pathogens using WGS and is therefore a serious impediment to clinical and epidemiological sequencing for health care and public health applications. We have developed a simple enzymatic treatment method that takes advantage of the methylation of human DNA to selectively deplete host contamination from clinical samples prior to sequencing. Using malaria clinical samples with over 80% human host DNA contamination, we show that the enzymatic treatment enriches Plasmodium falciparum DNA up to ∼9-fold and generates high-quality, nonbiased sequence reads covering >98% of 86,158 catalogued typeable single-nucleotide polymorphism loci

Crossref

PubMed Central

Oxford University Research Archive

Enlighten

Generating Synthetic Data for Neural Keyword-to-Question Models

Author: Bogdanova Dasha
Lin Chin-Yew
Mikolov Tomas
Ros German
Zheng Zhicheng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/07/2018
Field of study

Search typically relies on keyword queries, but these are often semantically ambiguous. We propose to overcome this by offering users natural language questions, based on their keyword queries, to disambiguate their intent. This keyword-to-question task may be addressed using neural machine translation techniques. Neural translation models, however, require massive amounts of training data (keyword-question pairs), which is unavailable for this task. The main idea of this paper is to generate large amounts of synthetic training data from a small seed set of hand-labeled keyword-question pairs. Since natural language questions are available in large quantities, we develop models to automatically generate the corresponding keyword queries. Further, we introduce various filtering mechanisms to ensure that synthetic training data is of high quality. We demonstrate the feasibility of our approach using both automatic and manual evaluation. This is an extended version of the article published with the same title in the Proceedings of ICTIR'18.Comment: Extended version of ICTIR'18 full paper, 11 page

arXiv.org e-Print Archive

Crossref