Search CORE

334 research outputs found

MemoNet:Memorizing Representations of All Cross Features Efficiently via Multi-Hash Codebook Network for CTR Prediction

Author: Zhang Junlin
Zhang Pengtao
Publication venue
Publication date: 03/11/2022
Field of study

New findings in natural language processing(NLP) demonstrate that the strong memorization capability contributes a lot to the success of large language models.This inspires us to explicitly bring an independent memory mechanism into CTR ranking model to learn and memorize all cross features'representations. In this paper,we propose multi-Hash Codebook NETwork(HCNet) as the memory mechanism for efficiently learning and memorizing representations of all cross features in CTR tasks.HCNet uses multi-hash codebook as the main memory place and the whole memory procedure consists of three phases: multi-hash addressing,memory restoring and feature shrinking.HCNet can be regarded as a general module and can be incorporated into any current deep CTR model.We also propose a new CTR model named MemoNet which combines HCNet with a DNN backbone.Extensive experimental results on three public datasets show that MemoNet reaches superior performance over state-of-the-art approaches and validate the effectiveness of HCNet as a strong memory module.Besides, MemoNet shows the prominent feature of big models in NLP,which means we can enlarge the size of codebook in HCNet to sustainably obtain performance gains.Our work demonstrates the importance and feasibility of learning and memorizing representations of all cross features ,which sheds light on a new promising research direction

arXiv.org e-Print Archive

Distributed Lagrange Multiplier/Fictitious Domain Finite Element Method for a Transient Stokes Interface Problem with Jump Coefficients

Author: Lundberg Andrew
Sun Pengtao
Wang Cheng
Zhang Chen-song
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2019
Field of study

The distributed Lagrange multiplier/fictitious domain (DLM/FD)-mixed finite element method is developed and analyzed in this paper for a transient Stokes interface problem with jump coefficients. The semi- and fully discrete DLM/FD-mixed finite element scheme are developed for the first time for this problem with a moving interface, where the arbitrary Lagrangian-Eulerian (ALE) technique is employed to deal with the moving and immersed subdomain. Stability and optimal convergence properties are obtained for both schemes. Numerical experiments are carried out for different scenarios of jump coefficients, and all theoretical results are validated

University of Nevada, Las Vegas Repository

FiBiNet++: Reducing Model Size by Low Rank Feature Interaction Layer for CTR Prediction

Author: Zhang Junlin
Zhang Pengtao
Zheng Zheng
Publication venue
Publication date: 21/08/2023
Field of study

Click-Through Rate (CTR) estimation has become one of the most fundamental tasks in many real-world applications and various deep models have been proposed. Some research has proved that FiBiNet is one of the best performance models and outperforms all other models on Avazu dataset. However, the large model size of FiBiNet hinders its wider application. In this paper, we propose a novel FiBiNet++ model to redesign FiBiNet's model structure, which greatly reduces model size while further improves its performance. One of the primary techniques involves our proposed "Low Rank Layer" focused on feature interaction, which serves as a crucial driver of achieving a superior compression ratio for models. Extensive experiments on three public datasets show that FiBiNet++ effectively reduces non-embedding model parameters of FiBiNet by 12x to 16x on three datasets. On the other hand, FiBiNet++ leads to significant performance improvements compared to state-of-the-art CTR methods, including FiBiNet

arXiv.org e-Print Archive

DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Author: Liang Youwei
Xie Pengtao
Zhang Li
Zhang Ruiyi
Publication venue
Publication date: 18/05/2023
Field of study

A ChatGPT-like system for drug compounds could be a game-changer in pharmaceutical research, accelerating drug discovery, enhancing our understanding of structure-activity relationships, guiding lead optimization, aiding drug repurposing, reducing the failure rate, and streamlining clinical trials. In this work, we make an initial attempt towards enabling ChatGPT-like capabilities on drug molecule graphs, by developing a prototype system DrugChat. DrugChat works in a similar way as ChatGPT. Users upload a compound molecule graph and ask various questions about this compound. DrugChat will answer these questions in a multi-turn, interactive manner. The DrugChat system consists of a graph neural network (GNN), a large language model (LLM), and an adaptor. The GNN takes a compound molecule graph as input and learns a representation for this graph. The adaptor transforms the graph representation produced by the GNN into another representation that is acceptable to the LLM. The LLM takes the compound representation transformed by the adaptor and users' questions about this compound as inputs and generates answers. All these components are trained end-to-end. To train DrugChat, we collected instruction tuning datasets which contain 10,834 drug compounds and 143,517 question-answer pairs. The code and data is available at \url{https://github.com/UCSD-AI4H/drugchat

arXiv.org e-Print Archive