45 research outputs found

    Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

    Full text link
    We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide the reasoning steps, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with PaLM-2L models and observe substantial performance gains on a wide range of challenging reasoning-intensive tasks including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back Prompting improves PaLM-2L performance on MMLU Physics and Chemistry by 7% and 11%, TimeQA by 27%, and MuSiQue by 7%

    TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks

    Full text link
    We present a framework for specifying, training, evaluating, and deploying machine learning models. Our focus is on simplifying cutting edge machine learning for practitioners in order to bring such technologies into production. Recognizing the fast evolution of the field of deep learning, we make no attempt to capture the design space of all possible model architectures in a domain- specific language (DSL) or similar configuration language. We allow users to write code to define their models, but provide abstractions that guide develop- ers to write models in ways conducive to productionization. We also provide a unifying Estimator interface, making it possible to write downstream infrastructure (e.g. distributed training, hyperparameter tuning) independent of the model implementation. We balance the competing demands for flexibility and simplicity by offering APIs at different levels of abstraction, making common model architectures available out of the box, while providing a library of utilities designed to speed up experimentation with model architectures. To make out of the box models flexible and usable across a wide range of problems, these canned Estimators are parameterized not only over traditional hyperparameters, but also using feature columns, a declarative specification describing how to interpret input data. We discuss our experience in using this framework in re- search and production environments, and show the impact on code health, maintainability, and development speed.Comment: 8 pages, Appeared at KDD 2017, August 13--17, 2017, Halifax, NS, Canad

    GraphSAIL: Graph Structure Aware Incremental Learning for Recommender Systems

    Full text link
    Given the convenience of collecting information through online services, recommender systems now consume large scale data and play a more important role in improving user experience. With the recent emergence of Graph Neural Networks (GNNs), GNN-based recommender models have shown the advantage of modeling the recommender system as a user-item bipartite graph to learn representations of users and items. However, such models are expensive to train and difficult to perform frequent updates to provide the most up-to-date recommendations. In this work, we propose to update GNN-based recommender models incrementally so that the computation time can be greatly reduced and models can be updated more frequently. We develop a Graph Structure Aware Incremental Learning framework, GraphSAIL, to address the commonly experienced catastrophic forgetting problem that occurs when training a model in an incremental fashion. Our approach preserves a user's long-term preference (or an item's long-term property) during incremental model updating. GraphSAIL implements a graph structure preservation strategy which explicitly preserves each node's local structure, global structure, and self-information, respectively. We argue that our incremental training framework is the first attempt tailored for GNN based recommender systems and demonstrate its improvement compared to other incremental learning techniques on two public datasets. We further verify the effectiveness of our framework on a large-scale industrial dataset.Comment: Accepted by CIKM2020 Applied Research Trac

    MatRec: Matrix Factorization for Highly Skewed Dataset

    Full text link
    Recommender systems is one of the most successful AI technologies applied in the internet cooperations. Popular internet products such as TikTok, Amazon, and YouTube have all integrated recommender systems as their core product feature. Although recommender systems have received great success, it is well known for highly skewed datasets, engineers and researchers need to adjust their methods to tackle the specific problem to yield good results. Inability to deal with highly skewed dataset usually generates hard computational problems for big data clusters and unsatisfactory results for customers. In this paper, we propose a new algorithm solving the problem in the framework of matrix factorization. We model the data skewness factors in the theoretic modeling of the approach with easy to interpret and easy to implement formulas. We prove in experiments our method generates comparably favorite results with popular recommender system algorithms such as Learning to Rank , Alternating Least Squares and Deep Matrix Factorization

    Controlled Decoding from Language Models

    Full text link
    We propose controlled decoding (CD), a novel off-policy reinforcement learning method to control the autoregressive generation from language models towards high reward outcomes. CD solves an off-policy reinforcement learning problem through a value function for the reward, which we call a prefix scorer. The prefix scorer is used at inference time to steer the generation towards higher reward outcomes. We show that the prefix scorer may be trained on (possibly) off-policy data to predict the expected reward when decoding is continued from a partially decoded response. We empirically demonstrate that CD is effective as a control mechanism on Reddit conversations corpus. We also show that the modularity of the design of CD makes it possible to control for multiple rewards, effectively solving a multi-objective reinforcement learning problem with no additional complexity. Finally, we show that CD can be applied in a novel blockwise fashion at inference-time, again without the need for any training-time changes, essentially bridging the gap between the popular best-of-KK strategy and token-level reinforcement learning. This makes CD a promising approach for alignment of language models

    The burden of cardiovascular disease in Asia from 2025 to 2050: a forecast analysis for East Asia, South Asia, South-East Asia, Central Asia, and high-income Asia Pacific regions

    Get PDF
    Background: Given the rapidly growing burden of cardiovascular disease (CVD) in Asia, this study forecasts the CVD burden and associated risk factors in Asia from 2025 to 2050. Methods: Data from the Global Burden of Disease 2019 study was used to construct regression models predicting prevalence, mortality, and disability-adjusted life years (DALYs) attributed to CVD and risk factors in Asia in the coming decades. Findings: Between 2025 and 2050, crude cardiovascular mortality is expected to rise 91.2% despite a 23.0% decrease in the age-standardised cardiovascular mortality rate (ASMR). Ischaemic heart disease (115 deaths per 100,000 population) and stroke (63 deaths per 100,000 population) will remain leading drivers of ASMR in 2050. Central Asia will have the highest ASMR (676 deaths per 100,000 population), more than three-fold that of Asia overall (186 deaths per 100,000 population), while high-income Asia sub-regions will incur an ASMR of 22 deaths per 100,000 in 2050. High systolic blood pressure will contribute the highest ASMR throughout Asia (105 deaths per 100,000 population), except in Central Asia where high fasting plasma glucose will dominate (546 deaths per 100,000 population). Interpretation:This forecast forewarns an almost doubling in crude cardiovascular mortality by 2050 in Asia, with marked heterogeneity across sub-regions. Atherosclerotic diseases will continue to dominate, while high systolic blood pressure will be the leading risk factor

    Learning and Recognizing The Hierarchical and Sequential Structure of Human Activities

    No full text
    <p>The mission of the research presented in this thesis is to give computers the power to sense and react to human activities. Without the ability to sense the surroundings and understand what humans are doing, computers will not be able to provide active, timely, appropriate, and considerate services to the humans. To accomplish this mission, the work stands on the shoulders of two giants: Machine learning and ubiquitous computing. Because of the ubiquity of sensor-enabled mobile and wearable devices, there has been an emerging opportunity to sense, learn, and infer human activities from the sensor data by leveraging state-of-the-art machine learning algorithms.</p> <p>While having shown promising results in human activity recognition, most existing approaches using supervised or semi-supervised learning have two fundamental problems. Firstly, most existing approaches require a large set of labeled sensor data for every target class, which requires a costly effort from human annotators. Secondly, an unseen new activity cannot be recognized if no training samples of that activity are available in the dataset. In light of these problems, a new approach in this area is proposed in our research.</p> <p>This thesis presents our novel approach to address the problem of human activity recognition when few or no training samples of the target activities are available. The main hypothesis is that the problem can be solved by the proposed NuActiv activity recognition framework, which consists of modeling the hierarchical and sequential structure of human activities, as well as bringing humans in the loop of model training. By injecting human knowledge about the hierarchical nature of human activities, a semantic attribute representation and a two-layer attribute-based learning approach are designed. To model the sequential structure, a probabilistic graphical model is further proposed to take into account the temporal dependency of activities and attributes. Finally, an active learning algorithm is developed to reinforce the recognition accuracy using minimal user feedback.</p> <p>The hypothesis and approaches presented in this thesis are validated by two case studies and real-world experiments on exercise activities and daily life activities. Experimental results show that the NuActiv framework can effectively recognize unseen new activities even without any training data, with up to 70-80% precision and recall rate. It also outperforms supervised learning with limited labeled data for the new classes. The results significantly advance the state of the art in human activity recognition, and represent a promising step towards bridging the gap between computers and humans.</p
    corecore