334 research outputs found
MemoNet:Memorizing Representations of All Cross Features Efficiently via Multi-Hash Codebook Network for CTR Prediction
New findings in natural language processing(NLP) demonstrate that the strong
memorization capability contributes a lot to the success of large language
models.This inspires us to explicitly bring an independent memory mechanism
into CTR ranking model to learn and memorize all cross
features'representations. In this paper,we propose multi-Hash Codebook
NETwork(HCNet) as the memory mechanism for efficiently learning and memorizing
representations of all cross features in CTR tasks.HCNet uses multi-hash
codebook as the main memory place and the whole memory procedure consists of
three phases: multi-hash addressing,memory restoring and feature
shrinking.HCNet can be regarded as a general module and can be incorporated
into any current deep CTR model.We also propose a new CTR model named MemoNet
which combines HCNet with a DNN backbone.Extensive experimental results on
three public datasets show that MemoNet reaches superior performance over
state-of-the-art approaches and validate the effectiveness of HCNet as a strong
memory module.Besides, MemoNet shows the prominent feature of big models in
NLP,which means we can enlarge the size of codebook in HCNet to sustainably
obtain performance gains.Our work demonstrates the importance and feasibility
of learning and memorizing representations of all cross features ,which sheds
light on a new promising research direction
Distributed Lagrange Multiplier/Fictitious Domain Finite Element Method for a Transient Stokes Interface Problem with Jump Coefficients
The distributed Lagrange multiplier/fictitious domain (DLM/FD)-mixed finite element method is developed and analyzed in this paper for a transient Stokes interface problem with jump coefficients. The semi- and fully discrete DLM/FD-mixed finite element scheme are developed for the first time for this problem with a moving interface, where the arbitrary Lagrangian-Eulerian (ALE) technique is employed to deal with the moving and immersed subdomain. Stability and optimal convergence properties are obtained for both schemes. Numerical experiments are carried out for different scenarios of jump coefficients, and all theoretical results are validated
FiBiNet++: Reducing Model Size by Low Rank Feature Interaction Layer for CTR Prediction
Click-Through Rate (CTR) estimation has become one of the most fundamental
tasks in many real-world applications and various deep models have been
proposed. Some research has proved that FiBiNet is one of the best performance
models and outperforms all other models on Avazu dataset. However, the large
model size of FiBiNet hinders its wider application. In this paper, we propose
a novel FiBiNet++ model to redesign FiBiNet's model structure, which greatly
reduces model size while further improves its performance. One of the primary
techniques involves our proposed "Low Rank Layer" focused on feature
interaction, which serves as a crucial driver of achieving a superior
compression ratio for models. Extensive experiments on three public datasets
show that FiBiNet++ effectively reduces non-embedding model parameters of
FiBiNet by 12x to 16x on three datasets. On the other hand, FiBiNet++ leads to
significant performance improvements compared to state-of-the-art CTR methods,
including FiBiNet
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
A ChatGPT-like system for drug compounds could be a game-changer in
pharmaceutical research, accelerating drug discovery, enhancing our
understanding of structure-activity relationships, guiding lead optimization,
aiding drug repurposing, reducing the failure rate, and streamlining clinical
trials. In this work, we make an initial attempt towards enabling ChatGPT-like
capabilities on drug molecule graphs, by developing a prototype system
DrugChat. DrugChat works in a similar way as ChatGPT. Users upload a compound
molecule graph and ask various questions about this compound. DrugChat will
answer these questions in a multi-turn, interactive manner. The DrugChat system
consists of a graph neural network (GNN), a large language model (LLM), and an
adaptor. The GNN takes a compound molecule graph as input and learns a
representation for this graph. The adaptor transforms the graph representation
produced by the GNN into another representation that is acceptable to the LLM.
The LLM takes the compound representation transformed by the adaptor and users'
questions about this compound as inputs and generates answers. All these
components are trained end-to-end. To train DrugChat, we collected instruction
tuning datasets which contain 10,834 drug compounds and 143,517 question-answer
pairs. The code and data is available at
\url{https://github.com/UCSD-AI4H/drugchat
- …