59 research outputs found

    An Analysis of Metadiscourse in the Abstracts of English Academic Papers

    Get PDF
    As an important part in academic writing meta discourse has got considerable attention in recent years Abstract plays an important role in academic writings and it reflects the main contents of the whole papers Based on the theory of metadiscourse and the classifications of Hyland this study compared the different frequency and usage of metadiscourse in mathematical and linguistic academic papers Two small abstracts corpora were compiled in this study including 30 mathematical and 30 linguistic abstracts of academic papers from Social Science Citation Index SSCI and Science Citation Index SCI journals The results showed that there appeared more metadiscourse in the abstracts of linguistic academic papers than mathematical academic papers Interactive meta discourse was adopted more than interactional metadiscourse in abstracts of the two disciplines In the use of interactive meta discourse both disciplines demonstrated the same trends in the frequencies of five sub-categories Regarding interactional metadiscourse hedges were the most frequently used meta discourse markers in linguistic academic papers while self mentions were most frequently used in mathematics It is suggested that more interactive meta discourse should be used in abstracts of both arts and science academic paper

    Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

    Full text link
    Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, introducing hundreds of billions of parameters for extremely high-cardinality features. This bottleneck has led to substantial progress in alternative embedding algorithms. Many of these methods, however, make the assumption that each feature uses an independent embedding table. This work introduces a simple yet highly effective framework, Feature Multiplexing, where one single representation space is used across many different categorical features. Our theoretical and empirical analysis reveals that multiplexed embeddings can be decomposed into components from each constituent feature, allowing models to distinguish between features. We show that multiplexed representations lead to Pareto-optimal parameter-accuracy tradeoffs for three public benchmark datasets. Further, we propose a highly practical approach called Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware. Unified embedding gives significant improvements in offline and online metrics compared to highly competitive baselines across five web-scale search, ads, and recommender systems, where it serves billions of users across the world in industry-leading products.Comment: NeurIPS'23 Spotligh

    Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems

    Full text link
    Learning feature interaction is the critical backbone to building recommender systems. In web-scale applications, learning feature interaction is extremely challenging due to the sparse and large input feature space; meanwhile, manually crafting effective feature interactions is infeasible because of the exponential solution space. We propose to leverage a Transformer-based architecture with attention layers to automatically capture feature interactions. Transformer architectures have witnessed great success in many domains, such as natural language processing and computer vision. However, there has not been much adoption of Transformer architecture for feature interaction modeling in industry. We aim at closing the gap. We identify two key challenges for applying the vanilla Transformer architecture to web-scale recommender systems: (1) Transformer architecture fails to capture the heterogeneous feature interactions in the self-attention layer; (2) The serving latency of Transformer architecture might be too high to be deployed in web-scale recommender systems. We first propose a heterogeneous self-attention layer, which is a simple yet effective modification to the self-attention layer in Transformer, to take into account the heterogeneity of feature interactions. We then introduce \textsc{Hiformer} (\textbf{H}eterogeneous \textbf{I}nteraction Trans\textbf{former}) to further improve the model expressiveness. With low-rank approximation and model pruning, \hiformer enjoys fast inference for online deployment. Extensive offline experiment results corroborates the effectiveness and efficiency of the \textsc{Hiformer} model. We have successfully deployed the \textsc{Hiformer} model to a real world large scale App ranking model at Google Play, with significant improvement in key engagement metrics (up to +2.66\%)

    Empowering Long-tail Item Recommendation through Cross Decoupling Network (CDN)

    Full text link
    Industry recommender systems usually suffer from highly-skewed long-tail item distributions where a small fraction of the items receives most of the user feedback. This skew hurts recommender quality especially for the item slices without much user feedback. While there have been many research advances made in academia, deploying these methods in production is very difficult and very few improvements have been made in industry. One challenge is that these methods often hurt overall performance; additionally, they could be complex and expensive to train and serve. In this work, we aim to improve tail item recommendations while maintaining the overall performance with less training and serving cost. We first find that the predictions of user preferences are biased under long-tail distributions. The bias comes from the differences between training and serving data in two perspectives: 1) the item distributions, and 2) user's preference given an item. Most existing methods mainly attempt to reduce the bias from the item distribution perspective, ignoring the discrepancy from user preference given an item. This leads to a severe forgetting issue and results in sub-optimal performance. To address the problem, we design a novel Cross Decoupling Network (CDN) (i) decouples the learning process of memorization and generalization on the item side through a mixture-of-expert architecture; (ii) decouples the user samples from different distributions through a regularized bilateral branch network. Finally, a new adapter is introduced to aggregate the decoupled vectors, and softly shift the training attention to tail items. Extensive experimental results show that CDN significantly outperforms state-of-the-art approaches on benchmark datasets. We also demonstrate its effectiveness by a case study of CDN in a large-scale recommendation system at Google.Comment: Accepted by KDD 2023 Applied Data Science (ADS) trac

    Review Article Autophagy in Hepatic Fibrosis

    Get PDF
    Hepatic fibrosis is a leading cause of morbidity and mortality worldwide. Hepatic fibrosis is usually associated with chronic liver diseases caused by infection, drugs, metabolic disorders, or autoimmune imbalances. Effective clinical therapies are still lacking. Autophagy is a cellular process that degrades damaged organelles or protein aggregation, which participates in many pathological processes including liver diseases. Autophagy participates in hepatic fibrosis by activating hepatic stellate cells and may participate as well through influencing other fibrogenic cells. Besides that, autophagy can induce some liver diseases to develop while it may play a protective role in hepatocellular abnormal aggregates related liver diseases and reduces fibrosis. With a better understanding of the potential effects of autophagy on hepatic fibrosis, targeting autophagy might be a novel therapeutic strategy for hepatic fibrosis in the near future

    HyperFormer: Learning Expressive Sparse Feature Representations via Hypergraph Transformer

    Full text link
    Learning expressive representations for high-dimensional yet sparse features has been a longstanding problem in information retrieval. Though recent deep learning methods can partially solve the problem, they often fail to handle the numerous sparse features, particularly those tail feature values with infrequent occurrences in the training data. Worse still, existing methods cannot explicitly leverage the correlations among different instances to help further improve the representation learning on sparse features since such relational prior knowledge is not provided. To address these challenges, in this paper, we tackle the problem of representation learning on feature-sparse data from a graph learning perspective. Specifically, we propose to model the sparse features of different instances using hypergraphs where each node represents a data instance and each hyperedge denotes a distinct feature value. By passing messages on the constructed hypergraphs based on our Hypergraph Transformer (HyperFormer), the learned feature representations capture not only the correlations among different instances but also the correlations among features. Our experiments demonstrate that the proposed approach can effectively improve feature representation learning on sparse features.Comment: Accepted by SIGIR 202

    How to Train Data-Efficient LLMs

    Full text link
    The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data consumption. We seek to understand the tradeoffs associated with data selection routines based on (i) expensive-to-compute data-quality estimates, and (ii) maximization of coverage and diversity-based measures in the feature space. Our first technique, Ask-LLM, leverages the zero-shot reasoning capabilities of instruction-tuned LLMs to directly assess the quality of a training example. To target coverage, we propose Density sampling, which models the data distribution to select a diverse sample. In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories. Coverage sampling can recover the performance of the full data, while models trained on Ask-LLM data consistently outperform full-data training -- even when we reject 90% of the original dataset, while converging up to 70% faster.Comment: Under review. 44 pages, 30 figure
    corecore