Search CORE

112 research outputs found

QuesNet: A Unified Representation for Heterogeneous Test Questions

Author: Boopathiraj C
Devlin Jacob
Douglas David E
Duan Huizhong
Glorot Xavier
Kingma Diederik P
Ngiam Jiquan
Zhang Liang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/05/2019
Field of study

Understanding learning materials (e.g. test questions) is a crucial issue in online learning systems, which can promote many applications in education domain. Unfortunately, many supervised approaches suffer from the problem of scarce human labeled data, whereas abundant unlabeled resources are highly underutilized. To alleviate this problem, an effective solution is to use pre-trained representations for question understanding. However, existing pre-training methods in NLP area are infeasible to learn test question representations due to several domain-specific characteristics in education. First, questions usually comprise of heterogeneous data including content text, images and side information. Second, there exists both basic linguistic information as well as domain logic and knowledge. To this end, in this paper, we propose a novel pre-training method, namely QuesNet, for comprehensively learning question representations. Specifically, we first design a unified framework to aggregate question information with its heterogeneous inputs into a comprehensive vector. Then we propose a two-level hierarchical pre-training algorithm to learn better understanding of test questions in an unsupervised way. Here, a novel holed language model objective is developed to extract low-level linguistic features, and a domain-oriented objective is proposed to learn high-level logic and knowledge. Moreover, we show that QuesNet has good capability of being fine-tuned in many question-based tasks. We conduct extensive experiments on large-scale real-world question data, where the experimental results clearly demonstrate the effectiveness of QuesNet for question understanding as well as its superior applicability

arXiv.org e-Print Archive

Crossref

Crystallographic and Fluorescence Studies of the Interaction of Haloalkane Dehalogenase with Halide Ions. Studies with Halide Compounds Reveal a Halide Binding Site in the Active Site

Author: DIJKSTRA BW
JANSSEN DB
KALK KH
Kingma Jacob
ROZEBOOM HJ
VERSCHUEREN KHG
Publication venue
Publication date: 07/09/1993
Field of study

Haloalkane dehalogenase from Xanthobacter autotrophicus GJ10 catalyzes the conversion of 1,2-dichloroethane to 2-chloroethanol and chloride without use of oxygen or cofactors. The active site is situated in an internal cavity, which is accesible from the solvent, even in the crystal. Crystal structures of the dehalogenase enzyme complexed with iodoacetamide, chloroacetamide, iodide, and chloride at pH 6.2 and 8.2 revealed a halide binding site between the ring NH's of two tryptophan residues, Trp-125 and Trp-175, located in the active site. The halide ion lies on the intersection of the planes of the rings of the tryptophans. The binding of iodide and chloride to haloalkane dehalogenase caused a strong decrease in protein fluorescence. The decrease could be fitted to a modified form of the Stern-Volmer equation, indicating the presence of fluorophors of different accessibilities. Halide binding was much stronger at pH 6.0 than at pH 8.2. Assuming ligand binding to Trp-125 and Trp-175 as the sole cause of fluorescence quenching, dissociation constants at pH 6.0 with chloride and iodide were calculated to be 0.49 +/- 0.04 and 0.074 +/- 0.007 mM, respectively. Detailed structural investigation showed that the halide binding site probably stabilizes the halide product as well as the negatively charged transition state occurring during the formation of the covalent intermediate

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Towards Knowledge-Based Personalized Product Description Generation in E-commerce

Author: Asghar Nabiha
Devlin Jacob
Gehring Jonas
Kingma Diederik P
Lample Guillaume
Li Jiwei
Li Jiwei
Nair Vinod
Sun Maosong
van der Maaten Laurens
Vaswani Ashish
Wang Jinpeng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/06/2019
Field of study

Quality product descriptions are critical for providing competitive customer experience in an e-commerce platform. An accurate and attractive description not only helps customers make an informed decision but also improves the likelihood of purchase. However, crafting a successful product description is tedious and highly time-consuming. Due to its importance, automating the product description generation has attracted considerable interests from both research and industrial communities. Existing methods mainly use templates or statistical methods, and their performance could be rather limited. In this paper, we explore a new way to generate the personalized product description by combining the power of neural networks and knowledge base. Specifically, we propose a KnOwledge Based pErsonalized (or KOBE) product description generation model in the context of e-commerce. In KOBE, we extend the encoder-decoder framework, the Transformer, to a sequence modeling formulation using self-attention. In order to make the description both informative and personalized, KOBE considers a variety of important factors during text generation, including product aspects, user categories, and knowledge base, etc. Experiments on real-world datasets demonstrate that the proposed method out-performs the baseline on various metrics. KOBE can achieve an improvement of 9.7% over state-of-the-arts in terms of BLEU. We also present several case studies as the anecdotal evidence to further prove the effectiveness of the proposed approach. The framework has been deployed in Taobao, the largest online e-commerce platform in China.Comment: KDD 2019 Camera-ready. Website: https://sites.google.com/view/kobe201

arXiv.org e-Print Archive

Crossref

What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

Author: Clark Kevin
Devlin Jacob
Kingma Diederik
Liu Yinhan
Raffel Colin
Vakulenko Svitlana
Wang Ruize
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/03/2021
Field of study

Heavily pre-trained transformer models such as BERT have recently shown to be remarkably powerful at language modelling by achieving impressive results on numerous downstream tasks. It has also been shown that they are able to implicitly store factual knowledge in their parameters after pre-training. Understanding what the pre-training procedure of LMs actually learns is a crucial step for using and improving them for Conversational Recommender Systems (CRS). We first study how much off-the-shelf pre-trained BERT "knows" about recommendation items such as books, movies and music. In order to analyze the knowledge stored in BERT's parameters, we use different probes that require different types of knowledge to solve, namely content-based and collaborative-based. Content-based knowledge is knowledge that requires the model to match the titles of items with their content information, such as textual descriptions and genres. In contrast, collaborative-based knowledge requires the model to match items with similar ones, according to community interactions such as ratings. We resort to BERT's Masked Language Modelling head to probe its knowledge about the genre of items, with cloze style prompts. In addition, we employ BERT's Next Sentence Prediction head and representations' similarity to compare relevant and non-relevant search and recommendation query-document inputs to explore whether BERT can, without any fine-tuning, rank relevant items first. Finally, we study how BERT performs in a conversational recommendation downstream task. Overall, our analyses and experiments show that: (i) BERT has knowledge stored in its parameters about the content of books, movies and music; (ii) it has more content-based knowledge than collaborative-based knowledge; and (iii) fails on conversational recommendation when faced with adversarial data.Comment: Accepted for publication at RecSys'2

arXiv.org e-Print Archive

Crossref

Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods

Author: Abadi Mart'in
Borji Ali
Chen Kan
Chen X
Devlin Jacob
Ioffe Sergey
Kingma Diederik P
Koch Christof
LeCun Yann
Neisser Ubric
Tehranchi Farnaz
Treue Stefan
Wu Xiaoli
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/05/2020
Field of study

Modeling visual search not only offers an opportunity to predict the usability of an interface before actually testing it on real users, but also advances scientific understanding about human behavior. In this work, we first conduct a set of analyses on a large-scale dataset of visual search tasks on realistic webpages. We then present a deep neural network that learns to predict the scannability of webpage content, i.e., how easy it is for a user to find a specific target. Our model leverages both heuristic-based features such as target size and unstructured features such as raw image pixels. This approach allows us to model complex interactions that might be involved in a realistic visual search task, which can not be easily achieved by traditional analytical models. We analyze the model behavior to offer our insights into how the salience map learned by the model aligns with human intuition and how the learned semantic representation of each target type relates to its visual search performance.Comment: the 2020 CHI Conference on Human Factors in Computing System

arXiv.org e-Print Archive

Crossref

Multistability and dynamic transitions of intracellular Min protein patterns

Author: Cees Dekker
Enzo Kingma
Epstein IR
Erwin Frey
Fabai Wu
Jacob Halatek
Matthias Reiter
Murray JD
Publication venue: 'EMBO'
Publication date: 01/01/2016
Field of study

Cells owe their internal organization to self-organized protein patterns, which originate and adapt to growth and external stimuli via a process that is as complex as it is little understood. Here, we study the emergence, stability, and state transitions of multistable Min protein oscillation patterns in live Escherichia coli bacteria during growth up to defined large dimensions. De novo formation of patterns from homogenous starting conditions is observed and studied both experimentally and in simulations. A new theoretical approach is developed for probing pattern stability under perturbations. Quantitative experiments and simulations show that, once established, Min oscillations tolerate a large degree of intracellular heterogeneity, allowing distinctly different patterns to persist in different cells with the same geometry. Min patterns maintain their axes for hours in experiments, despite imperfections, expansion, and changes in cell shape during continuous cell growth. Transitions between multistable Min patterns are found to be rare events induced by strong intracellular perturbations. The instances of multistability studied here are the combined outcome of boundary growth and strongly nonlinear kinetics, which are characteristic of the reaction-diffusion patterns that pervade biology at many scales

BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision

Author: Devlin Jacob
Fries Jason
Giannakopoulos Athanasios
Jiang Haoming
Kingma Diederik P
Pennington Jeffrey
Weischedel Ralph
Yang Zhilin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/06/2020
Field of study

We study the open-domain named entity recognition (NER) problem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework -- BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 benchmark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in https://github.com/cliang1453/BOND.Comment: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20

arXiv.org e-Print Archive

Crossref

Specificity and kinetics of haloalkane dehalogenase

Author: Janssen DB
Kingma Jacob
Schanstra Joost P.
Schanstra JP
Publication venue
Publication date: 21/06/1996
Field of study

Haloalkane dehalogenase converts halogenated alkanes to their corresponding alcohols, The active site is buried inside the protein and lined with hydrophobic residues, The reaction proceeds via a covalent substrate-enzyme complex, This paper describes a steady-state and pre-steady-state kinetic analysis of the conversion of a number of substrates of the dehalogenase, The kinetic mechanism for the ''natural'' substrate 1,2-dichloroethane and for the brominated analog and nematocide 1,2-dibromoethane are given, In general, brominated substrates had a lower K-m, but a similar k(cat) than the chlorinated analogs, The rate of C-Br bond cleavage was higher than the rate of C-CL bond cleavage, which is in agreement with the leaving group abilities of these halogens, The lower K-m for brominated compounds therefore originates both from the higher rate of C-Br bond cleavage and from a lower K-m for bromo-compounds, However, the rate-determining step in the conversion (k(cat)) of 1,2-dibromoethane and 1,2-dichloroethane was found to be release of the charged halide ion out of the active site cavity, explaining the different K-m but similar k(cat) values for these compounds, The study provides a basis for the analysis of rate-determining steps in the hydrolysis of various environmentally important substrates

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Asking Questions the Human Way: Scalable Question-Answer Generation from Text Corpus

Author: Devlin Jacob
Kingma P
Lin Chin-Yew
Liu Bang
Papineni Kishore
Paszke Adam
Wolf Thomas
Zhou Qingyu
Řehůřek Radim
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/03/2020
Field of study

The ability to ask questions is important in both human and machine intelligence. Learning to ask questions helps knowledge acquisition, improves question-answering and machine reading comprehension tasks, and helps a chatbot to keep the conversation flowing with a human. Existing question generation models are ineffective at generating a large amount of high-quality question-answer pairs from unstructured text, since given an answer and an input passage, question generation is inherently a one-to-many mapping. In this paper, we propose Answer-Clue-Style-aware Question Generation (ACS-QG), which aims at automatically generating high-quality and diverse question-answer pairs from unlabeled text corpus at scale by imitating the way a human asks questions. Our system consists of: i) an information extractor, which samples from the text multiple types of assistive information to guide question generation; ii) neural question generators, which generate diverse and controllable questions, leveraging the extracted assistive information; and iii) a neural quality controller, which removes low-quality generated data based on text entailment. We compare our question generation models with existing approaches and resort to voluntary human evaluation to assess the quality of the generated question-answer pairs. The evaluation results suggest that our system dramatically outperforms state-of-the-art neural question generation models in terms of the generation quality, while being scalable in the meantime. With models trained on a relatively smaller amount of data, we can generate 2.8 million quality-assured question-answer pairs from a million sentences found in Wikipedia.Comment: Accepted by The Web Conference 2020 (WWW 2020) as full paper (oral presentation

arXiv.org e-Print Archive

Crossref

Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning

Author: Abadi Martín
Benesty Jacob
David
Huang Tianchi
Jiang Junchen
Kingma Diederik P
Laskey Michael
Mao Hongzi
Mnih Volodymyr
Rehman Abdul
Ross Stéphane
Spiteri Kevin
Zahaib
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/12/2019
Field of study

Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding strategies without any presumptions, has become one of the research hotspots for adaptive streaming. However, it typically suffers from several issues, i.e., low sample efficiency and lack of awareness of the video quality information. In this paper, we propose Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling the above issues. Comyco trains the policy via imitating expert trajectories given by the instant solver, which can not only avoid redundant exploration but also make better use of the collected samples. Meanwhile, Comyco attempts to pick the chunk with higher perceptual video qualities rather than video bitrates. To achieve this, we construct Comyco's neural network architecture, video datasets and QoE metrics with video quality features. Using trace-driven and real-world experiments, we demonstrate significant improvements of Comyco's sample efficiency in comparison to prior work, with 1700x improvements in terms of the number of samples required and 16x improvements on training time required. Moreover, results illustrate that Comyco outperforms previously proposed methods, with the improvements on average QoE of 7.5% - 16.79%. Especially, Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average video quality under the same rebuffering time.Comment: ACM Multimedia 201

arXiv.org e-Print Archive

Crossref