25 research outputs found

    Feedback-Based Self-Learning in Large-Scale Conversational AI Agents

    Full text link
    Today, most large-scale conversational AI agents (e.g. Alexa, Siri, or Google Assistant) are built using manually annotated data to train the different components of the system. Typically, the accuracy of the ML models in these components are improved by manually transcribing and annotating data. As the scope of these systems increase to cover more scenarios and domains, manual annotation to improve the accuracy of these components becomes prohibitively costly and time consuming. In this paper, we propose a system that leverages user-system interaction feedback signals to automate learning without any manual annotation. Users here tend to modify a previous query in hopes of fixing an error in the previous turn to get the right results. These reformulations, which are often preceded by defective experiences caused by errors in ASR, NLU, ER or the application. In some cases, users may not properly formulate their requests (e.g. providing partial title of a song), but gleaning across a wider pool of users and sessions reveals the underlying recurrent patterns. Our proposed self-learning system automatically detects the errors, generate reformulations and deploys fixes to the runtime system to correct different types of errors occurring in different components of the system. In particular, we propose leveraging an absorbing Markov Chain model as a collaborative filtering mechanism in a novel attempt to mine these patterns. We show that our approach is highly scalable, and able to learn reformulations that reduce Alexa-user errors by pooling anonymized data across millions of customers. The proposed self-learning system achieves a win/loss ratio of 11.8 and effectively reduces the defect rate by more than 30% on utterance level reformulations in our production A/B tests. To the best of our knowledge, this is the first self-learning large-scale conversational AI system in production.Comment: 8 pages, 2 figure

    Knowledge Distillation from Internal Representations

    Full text link
    Knowledge distillation is typically conducted by training a small model (the student) to mimic a large and cumbersome model (the teacher). The idea is to compress the knowledge from the teacher by using its output probabilities as soft-labels to optimize the student. However, when the teacher is considerably large, there is no guarantee that the internal knowledge of the teacher will be transferred into the student; even if the student closely matches the soft-labels, its internal representations may be considerably different. This internal mismatch can undermine the generalization capabilities originally intended to be transferred from the teacher to the student. In this paper, we propose to distill the internal representations of a large model such as BERT into a simplified version of it. We formulate two ways to distill such representations and various algorithms to conduct the distillation. We experiment with datasets from the GLUE benchmark and consistently show that adding knowledge distillation from internal representations is a more powerful method than only using soft-label distillation.Comment: To appear in AAAI-202

    Impact of geographic diversity on citation of collaborative research

    Get PDF
    Diversity in human capital is widely seen as critical to creating holistic and high-quality research, especially in areas that engage with diverse cultures, environments, and challenges. Quantification of diverse academic collaborations and their effect on research quality is lacking, especially at international scale and across different domains. Here, we present the first effort to measure the impact of geographic diversity in coauthorships on the citation of their papers across different academic domains. Our results unequivocally show that geographic coauthor diversity improves paper citation, but very long distance collaborations have variable impact. We also discover “well-trodden” collaboration circles that yield much less impact than similar travel distances. These relationships are observed to exist across different subject areas, but with varying strengths. These findings can help academics identify new opportunities from a diversity perspective, as well as inform funders on areas that require additional mobility support

    The Function of MoGlk1 in Integration of Glucose and Ammonium Utilization in Magnaporthe oryzae

    Get PDF
    Hexokinases are conserved proteins functioning in glucose sensing and signaling. The rice blast fungus Magnaporthe oryzae contains several hexokinases, including MoHxk1 (hexokinase) and MoGlk1 (glucokinase) encoded respectively by MoHXK1 and MoGLK1 genes. The heterologous expression of MoGlk1 and MoHxk1 in Saccharomyces cerevisiae confirmed their conserved functions. Disruption of MoHXK1 resulted in growth reduction in medium containing fructose as the sole carbon source, whereas disruption of MoGLK1 did not cause the similar defect. However, the ΔMoglk1 mutant displayed decreased proton extrusion and a lower biomass in the presence of ammonium, suggesting a decline in the utilization of ammonium. Additionally, the MoGLK1 allele lacking catalytic activity restored growth to the ΔMoglk1 mutant. Moreover, the expression of MoPMA1 encoding a plasma membrane H+-ATPase decreased in the ΔMoglk1 mutant that can be suppressed by glucose and G-6-P. Thus, MoGlk1, but not MoHxk1, regulates ammonium utilization through a mechanism that is independent from its catalytic activity

    Clinicopathological Features and Prognosis of Papillary Thyroid Microcarcinoma for Surgery and Relationships with the BRAF<sup>V600E</sup> Mutational Status and Expression of Angiogenic Factors

    No full text
    <div><p>Objective</p><p>To investigate the clinicopathological characteristics of papillary thyroid microcarcinoma (PTMC) for surgery by comparing the difference between PTMC and larger papillary thyroid carcinoma (LPTC).</p><p>Methods</p><p>We analyzed the differences in the clinicopathological characteristics, prognosis, B-type RAF kinase (BRAF)<sup>V600E</sup> mutational status and expression of angiogenic factors, including pigment epithelium-derived factor (PEDF), Vascular Endothelial Growth Factor (VEGF), and hypoxia-inducible factor alpha subunit (HIF-1α), between PTMC and LPTC by retrospectively reviewing the records of 251 patients with papillary thyroid carcinoma, 169 with PTMC, and 82 with LPTC (diameter >1 cm).</p><p>Results</p><p>There were no significant differences in the gender, age, multifocality, Hashimoto’s thyroiditis, TNM stage, PEDF protein expression, rate of recurrence, or mean follow-up duration between patients with PTMC or LPTC. The prevalence of extrathyroidal invasion (EI), lymph node metastasis (LNM), and BRAF mutation in patients with PTMC was significantly lower than in patients with LPTC. In addition, in PTMC patients with EI and/or LNM and/or positive BRAF (high-risk PTMC patients), the prevalence of extrathyroidal invasion, Hashimoto's disease, lymph node metastasis, tumor TNM stage, PEDF positive protein expression, the rate of recurrent disease, and the mRNA expression of anti-angiogenic factors was almost as high as in patients with larger PTC, but with no significant difference.</p><p>Conclusions</p><p>Extrathyroid invasion, lymph node metastases, and BRAF<sup>V600E</sup> mutation were the high risk factors of PTMC. PTMC should be considered for the same treatment strategy as LPTC when any of these factors is found. Particularly, PTMC with BRAF<sup>V600E</sup> gene mutations needed earlier surgical treatment. In addition, the high cell subtype of PTMC with BRAF<sup>V600E</sup> gene mutation is recommended for total thyroidectomy in primary surgery to reduce the risk of recurrence.</p></div
    corecore