59 research outputs found
An Analysis of Metadiscourse in the Abstracts of English Academic Papers
As an important part in academic writing meta discourse has got considerable attention in recent years Abstract plays an important role in academic writings and it reflects the main contents of the whole papers Based on the theory of metadiscourse and the classifications of Hyland this study compared the different frequency and usage of metadiscourse in mathematical and linguistic academic papers Two small abstracts corpora were compiled in this study including 30 mathematical and 30 linguistic abstracts of academic papers from Social Science Citation Index SSCI and Science Citation Index SCI journals The results showed that there appeared more metadiscourse in the abstracts of linguistic academic papers than mathematical academic papers Interactive meta discourse was adopted more than interactional metadiscourse in abstracts of the two disciplines In the use of interactive meta discourse both disciplines demonstrated the same trends in the frequencies of five sub-categories Regarding interactional metadiscourse hedges were the most frequently used meta discourse markers in linguistic academic papers while self mentions were most frequently used in mathematics It is suggested that more interactive meta discourse should be used in abstracts of both arts and science academic paper
Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems
Learning high-quality feature embeddings efficiently and effectively is
critical for the performance of web-scale machine learning systems. A typical
model ingests hundreds of features with vocabularies on the order of millions
to billions of tokens. The standard approach is to represent each feature value
as a d-dimensional embedding, introducing hundreds of billions of parameters
for extremely high-cardinality features. This bottleneck has led to substantial
progress in alternative embedding algorithms. Many of these methods, however,
make the assumption that each feature uses an independent embedding table. This
work introduces a simple yet highly effective framework, Feature Multiplexing,
where one single representation space is used across many different categorical
features. Our theoretical and empirical analysis reveals that multiplexed
embeddings can be decomposed into components from each constituent feature,
allowing models to distinguish between features. We show that multiplexed
representations lead to Pareto-optimal parameter-accuracy tradeoffs for three
public benchmark datasets. Further, we propose a highly practical approach
called Unified Embedding with three major benefits: simplified feature
configuration, strong adaptation to dynamic data distributions, and
compatibility with modern hardware. Unified embedding gives significant
improvements in offline and online metrics compared to highly competitive
baselines across five web-scale search, ads, and recommender systems, where it
serves billions of users across the world in industry-leading products.Comment: NeurIPS'23 Spotligh
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems
Learning feature interaction is the critical backbone to building recommender
systems. In web-scale applications, learning feature interaction is extremely
challenging due to the sparse and large input feature space; meanwhile,
manually crafting effective feature interactions is infeasible because of the
exponential solution space. We propose to leverage a Transformer-based
architecture with attention layers to automatically capture feature
interactions. Transformer architectures have witnessed great success in many
domains, such as natural language processing and computer vision. However,
there has not been much adoption of Transformer architecture for feature
interaction modeling in industry. We aim at closing the gap. We identify two
key challenges for applying the vanilla Transformer architecture to web-scale
recommender systems: (1) Transformer architecture fails to capture the
heterogeneous feature interactions in the self-attention layer; (2) The serving
latency of Transformer architecture might be too high to be deployed in
web-scale recommender systems. We first propose a heterogeneous self-attention
layer, which is a simple yet effective modification to the self-attention layer
in Transformer, to take into account the heterogeneity of feature interactions.
We then introduce \textsc{Hiformer} (\textbf{H}eterogeneous
\textbf{I}nteraction Trans\textbf{former}) to further improve the model
expressiveness. With low-rank approximation and model pruning, \hiformer enjoys
fast inference for online deployment. Extensive offline experiment results
corroborates the effectiveness and efficiency of the \textsc{Hiformer} model.
We have successfully deployed the \textsc{Hiformer} model to a real world large
scale App ranking model at Google Play, with significant improvement in key
engagement metrics (up to +2.66\%)
Empowering Long-tail Item Recommendation through Cross Decoupling Network (CDN)
Industry recommender systems usually suffer from highly-skewed long-tail item
distributions where a small fraction of the items receives most of the user
feedback. This skew hurts recommender quality especially for the item slices
without much user feedback. While there have been many research advances made
in academia, deploying these methods in production is very difficult and very
few improvements have been made in industry. One challenge is that these
methods often hurt overall performance; additionally, they could be complex and
expensive to train and serve. In this work, we aim to improve tail item
recommendations while maintaining the overall performance with less training
and serving cost. We first find that the predictions of user preferences are
biased under long-tail distributions. The bias comes from the differences
between training and serving data in two perspectives: 1) the item
distributions, and 2) user's preference given an item. Most existing methods
mainly attempt to reduce the bias from the item distribution perspective,
ignoring the discrepancy from user preference given an item. This leads to a
severe forgetting issue and results in sub-optimal performance.
To address the problem, we design a novel Cross Decoupling Network (CDN) (i)
decouples the learning process of memorization and generalization on the item
side through a mixture-of-expert architecture; (ii) decouples the user samples
from different distributions through a regularized bilateral branch network.
Finally, a new adapter is introduced to aggregate the decoupled vectors, and
softly shift the training attention to tail items. Extensive experimental
results show that CDN significantly outperforms state-of-the-art approaches on
benchmark datasets. We also demonstrate its effectiveness by a case study of
CDN in a large-scale recommendation system at Google.Comment: Accepted by KDD 2023 Applied Data Science (ADS) trac
Review Article Autophagy in Hepatic Fibrosis
Hepatic fibrosis is a leading cause of morbidity and mortality worldwide. Hepatic fibrosis is usually associated with chronic liver diseases caused by infection, drugs, metabolic disorders, or autoimmune imbalances. Effective clinical therapies are still lacking. Autophagy is a cellular process that degrades damaged organelles or protein aggregation, which participates in many pathological processes including liver diseases. Autophagy participates in hepatic fibrosis by activating hepatic stellate cells and may participate as well through influencing other fibrogenic cells. Besides that, autophagy can induce some liver diseases to develop while it may play a protective role in hepatocellular abnormal aggregates related liver diseases and reduces fibrosis. With a better understanding of the potential effects of autophagy on hepatic fibrosis, targeting autophagy might be a novel therapeutic strategy for hepatic fibrosis in the near future
HyperFormer: Learning Expressive Sparse Feature Representations via Hypergraph Transformer
Learning expressive representations for high-dimensional yet sparse features
has been a longstanding problem in information retrieval. Though recent deep
learning methods can partially solve the problem, they often fail to handle the
numerous sparse features, particularly those tail feature values with
infrequent occurrences in the training data. Worse still, existing methods
cannot explicitly leverage the correlations among different instances to help
further improve the representation learning on sparse features since such
relational prior knowledge is not provided. To address these challenges, in
this paper, we tackle the problem of representation learning on feature-sparse
data from a graph learning perspective. Specifically, we propose to model the
sparse features of different instances using hypergraphs where each node
represents a data instance and each hyperedge denotes a distinct feature value.
By passing messages on the constructed hypergraphs based on our Hypergraph
Transformer (HyperFormer), the learned feature representations capture not only
the correlations among different instances but also the correlations among
features. Our experiments demonstrate that the proposed approach can
effectively improve feature representation learning on sparse features.Comment: Accepted by SIGIR 202
How to Train Data-Efficient LLMs
The training of large language models (LLMs) is expensive. In this paper, we
study data-efficient approaches for pre-training LLMs, i.e., techniques that
aim to optimize the Pareto frontier of model quality and training resource/data
consumption. We seek to understand the tradeoffs associated with data selection
routines based on (i) expensive-to-compute data-quality estimates, and (ii)
maximization of coverage and diversity-based measures in the feature space. Our
first technique, Ask-LLM, leverages the zero-shot reasoning capabilities of
instruction-tuned LLMs to directly assess the quality of a training example. To
target coverage, we propose Density sampling, which models the data
distribution to select a diverse sample. In our comparison of 19 samplers,
involving hundreds of evaluation tasks and pre-training runs, we find that
Ask-LLM and Density are the best methods in their respective categories.
Coverage sampling can recover the performance of the full data, while models
trained on Ask-LLM data consistently outperform full-data training -- even when
we reject 90% of the original dataset, while converging up to 70% faster.Comment: Under review. 44 pages, 30 figure
- …