1,147 research outputs found

    Explaining Math Word Problem Solvers

    Full text link
    Automated math word problem solvers based on neural networks have successfully managed to obtain 70-80\% accuracy in solving arithmetic word problems. However, it has been shown that these solvers may rely on superficial patterns to obtain their equations. In order to determine what information math word problem solvers use to generate solutions, we remove parts of the input and measure the model's performance on the perturbed dataset. Our results show that the model is not sensitive to the removal of many words from the input and can still manage to find a correct answer when given a nonsense question. This indicates that automatic solvers do not follow the semantic logic of math word problems, and may be overfitting to the presence of specific words

    A Survey of Deep Learning for Mathematical Reasoning

    Full text link
    Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.Comment: Accepted to ACL 2023. The repository is available at https://github.com/lupantech/dl4mat

    Mathematics, word problems, common sense, and artificial intelligence

    Full text link
    The paper discusses the capacities and limitations of current artificial intelligence (AI) technology to solve word problems that combine elementary knowledge with commonsense reasoning. No existing AI systems can solve these reliably. We review three approaches that have been developed, using AI natural language technology: outputting the answer directly, outputting a computer program that solves the problem, and outputting a formalized representation that can be input to an automated theorem verifier. We review some benchmarks that have been developed to evaluate these systems and some experimental studies. We discuss the limitations of the existing technology at solving these kinds of problems. We argue that it is not clear whether these kinds of limitations will be important in developing AI technology for pure mathematical research, but that they will be important in applications of mathematics, and may well be important in developing programs capable of reading and understanding mathematical content written by humans

    Exploring Culturally Responsive Equitable Problem-Solving Pedagogy: Theorizing, Developing & Teaching

    Get PDF
    Achievement gaps in mathematics between middle and high school Black students when compared to their white peers exist in part because of access, but also because Black learners’ brilliance is not recognized. Finding ways to help students, especially Black students, become successful mathematical problem solvers was a driving force behind this research. The purpose of this research is to explore ideas of how to improve Black students\u27 opportunities to engage in effective mathematical problem solving to improve their mathematics understanding and achievement. This study introduces the Culturally Responsive Equitable Problem Solving (CREPS) pedagogy situated at the intersections of a conceptual framework comprised of three pedagogies - Gay’s (2002) Culturally Responsive Pedagogy, Aguirre, Mayfield-Ingram, and Martin’s (2013) Equity-Based Mathematics Practices, and Schroeder and Lester’s (1989) Teaching through Problem-Solving. This dissertation study is guided by several research questions and reported through three separate, but related essays: (a) Theorizing a Culturally Responsive Equitable Problem-Solving Pedagogy; (b) Developing Culturally Responsive Equitable Problem-Solving Pedagogical Knowledge; and (c) Cases of Exemplary Instances of Teaching Culturally Responsive Equitable Problem-Solving Pedagogy. These three essays introduce and explain the tenets of CREPS pedagogy, examine secondary mathematics teachers\u27 culturally responsive teaching readiness and their professional learning of CREPS through CREPS pedagogy, and identify instances of exemplary CREPS pedagogical teaching during lesson study (i.e., collaborative planning and iterative teaching of a CREPS lesson), respectively. There were several findings from this research. The most salient findings were the three CREPS pedagogical moves: (a) development of deep mathematics understanding; (b) acknowledgement of students’ backgrounds; and (c) employment of equitable pedagogical practices. Several ideas for future research are shared related to refining and testing the CREPS pedagogy, developing teachers\u27 CREPS pedagogical knowledge, and teaching experiments for enacting CREPS pedagogy

    Lemur: Harmonizing Natural Language and Code for Language Agents

    Full text link
    We introduce Lemur and Lemur-Chat, openly accessible language models optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents. The evolution from language chat models to functional language agents demands that models not only master human interaction, reasoning, and planning but also ensure grounding in the relevant environments. This calls for a harmonious blend of language and coding capabilities in the models. Lemur and Lemur-Chat are proposed to address this necessity, demonstrating balanced proficiencies in both domains, unlike existing open-source models that tend to specialize in either. Through meticulous pre-training using a code-intensive corpus and instruction fine-tuning on text and code data, our models achieve state-of-the-art averaged performance across diverse text and coding benchmarks among open-source models. Comprehensive experiments demonstrate Lemur's superiority over existing open-source models and its proficiency across various agent tasks involving human communication, tool usage, and interaction under fully- and partially- observable environments. The harmonization between natural and programming languages enables Lemur-Chat to significantly narrow the gap with proprietary models on agent abilities, providing key insights into developing advanced open-source agents adept at reasoning, planning, and operating seamlessly across environments. https://github.com/OpenLemur/Lemu

    A Survey of Large Language Models

    Full text link
    Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.Comment: ongoing work; 51 page

    Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models

    Full text link
    Creating learning models that can exhibit sophisticated reasoning skills is one of the greatest challenges in deep learning research, and mathematics is rapidly becoming one of the target domains for assessing scientific progress in this direction. In the past few years there has been an explosion of neural network architectures, data sets, and benchmarks specifically designed to tackle mathematical problems, reporting notable success in disparate fields such as automated theorem proving, numerical integration, and discovery of new conjectures or matrix multiplication algorithms. However, despite these impressive achievements it is still unclear whether deep learning models possess an elementary understanding of quantities and symbolic numbers. In this survey we critically examine the recent literature, concluding that even state-of-the-art architectures often fall short when probed with relatively simple tasks designed to test basic numerical and arithmetic knowledge
    • …
    corecore