11 research outputs found

    FormalGeo: An Extensible Formalized Framework for Olympiad Geometric Problem Solving

    Full text link
    This is the first paper in a series of work we have accomplished over the past three years. In this paper, we have constructed a consistent formal plane geometry system. This will serve as a crucial bridge between IMO-level plane geometry challenges and readable AI automated reasoning. Within this formal framework, we have been able to seamlessly integrate modern AI models with our formal system. AI is now capable of providing deductive reasoning solutions to IMO-level plane geometry problems, just like handling other natural languages, and these proofs are readable, traceable, and verifiable. We propose the geometry formalization theory (GFT) to guide the development of the geometry formal system. Based on the GFT, we have established the FormalGeo, which consists of 88 geometric predicates and 196 theorems. It can represent, validate, and solve IMO-level geometry problems. we also have crafted the FGPS (formal geometry problem solver) in Python. It serves as both an interactive assistant for verifying problem-solving processes and an automated problem solver. We've annotated the formalgeo7k and formalgeo-imo datasets. The former contains 6,981 (expand to 133,818 through data augmentation) geometry problems, while the latter includes 18 (expand to 2,627 and continuously increasing) IMO-level challenging geometry problems. All annotated problems include detailed formal language descriptions and solutions. Implementation of the formal system and experiments validate the correctness and utility of the GFT. The backward depth-first search method only yields a 2.42% problem-solving failure rate, and we can incorporate deep learning techniques to achieve lower one. The source code of FGPS and datasets are available at https://github.com/BitSecret/FGPS.Comment: 44 page

    Formalizing Chemical Physics using the Lean Theorem Prover

    Full text link
    Chemical theory can be made more rigorous using the Lean theorem prover, an interactive theorem prover for complex mathematics. We formalize the Langmuir and BET theories of adsorption, making each scientific premise clear and every step of the derivations explicit. Lean's math library, mathlib, provides formally verified theorems for infinite geometries series, which are central to BET theory. While writing these proofs, Lean prompts us to include mathematical constraints that were not originally reported. We also illustrate how Lean flexibly enables the reuse of proofs that build on more complex theories through the use of functions, definitions, and structures. Finally, we construct scientific frameworks for interoperable proofs, by creating structures for classical thermodynamics and kinematics, using them to formalize gas law relationships like Boyle's Law and equations of motion underlying Newtonian mechanics, respectively. This approach can be extended to other fields, enabling the formalization of rich and complex theories in science and engineering

    Automated Deduction – CADE 28

    Get PDF
    This open access book constitutes the proceeding of the 28th International Conference on Automated Deduction, CADE 28, held virtually in July 2021. The 29 full papers and 7 system descriptions presented together with 2 invited papers were carefully reviewed and selected from 76 submissions. CADE is the major forum for the presentation of research in all aspects of automated deduction, including foundations, applications, implementations, and practical experience. The papers are organized in the following topics: Logical foundations; theory and principles; implementation and application; ATP and AI; and system descriptions

    TME Volume 3, Number 1

    Get PDF

    Advancing mathematical reasoning with deep learning : from numerical insights to geometrical understanding

    Get PDF
    The field of automated mathematical reasoning has captured the interest of the AI community since the last century, acknowledged as a key step towards achieving true artificial intelligence. This research domain’s evolution transits through rule-based approaches, semantic parsing, statistical machine learning, and recently, deep learning techniques. Moreover, automated mathematical reasoning has found extensive commercial applications. Educational enterprises have begun leveraging AI models for intelligent tutoring systems to assist students with mathematical problems. In the financial sector, it aids in analysing complex financial reports, with firms like JP Morgan incorporating AI to enhance their analysis capabilities. This thesis concentrates on two distinct tasks within automated mathematical reasoning: text-based numerical reasoning and automated geometry maths problem solving. Current methods face challenges in addressing complex mathematical reasoning tasks, evident in the lengthy and diverse solutions required. Additionally, in solving geometry maths problems, there is a noticeable deficiency in models’ abilities to accurately interpret geometric relationships from diagrams, which compromises their effectiveness. Furthermore, the advent of large language models (LLMs) and multi-modal models (MMs) underscores the need for a standardised benchmark to evaluate these models’ abilities in geometry problem-solving. To address these issues, we introduce the ELASTIC model in this thesis, designed for text-based numerical reasoning task. ELASTIC uniquely separates the generation of operators and operands to minimise errors from complex reasoning chains and is versatile enough to accommodate a varying number of operands per operator. This makes it broadly applicable across different domains. Our experimental results show ELASTIC’s superior performance, significantly outperforming prior models. Furthermore, we extend the application of the ELASTIC model to tackle geometry maths problems, which are inherently more complex due to the inclusion of geometric diagrams and a broader variety of problem types. To navigate these complexities, we propose the Geometry-Aware Problem Solver (GAPS), a model specifically crafted to solve diverse types of geometric maths problems by generating tailored solution programs. Our experiments validate GAPS’s advancement over existing methods. However, we observed that direct vector representation of geometric diagrams fails to capture the complex geometric relationships, which are critical in solving geometry maths problems. To overcome this, we propose converting geometric relationships into natural language, integrating them with the textual problem descriptions. This method not only improves the interpretability and effectiveness of the models but also allows for the utilisation of LLMs in generating reasoning programs. Lastly, despite the impressive capabilities of recent LLMs and MMs, their proficiency in solving geometry problems, requiring an integrated understanding of textual and visual information, remains unexplored. To fill this gap, we introduce the GeoEval benchmark in this thesis. Through extensive evaluation with GeoEval, we provide a comprehensive quantitative evaluation of the latest LLMs and MMs in geometry problem-solving task. This research marks a significant step forward in assessing the capabilities of state-of-the-art AI models in the realm of geometry problem-solving task.The field of automated mathematical reasoning has captured the interest of the AI community since the last century, acknowledged as a key step towards achieving true artificial intelligence. This research domain’s evolution transits through rule-based approaches, semantic parsing, statistical machine learning, and recently, deep learning techniques. Moreover, automated mathematical reasoning has found extensive commercial applications. Educational enterprises have begun leveraging AI models for intelligent tutoring systems to assist students with mathematical problems. In the financial sector, it aids in analysing complex financial reports, with firms like JP Morgan incorporating AI to enhance their analysis capabilities. This thesis concentrates on two distinct tasks within automated mathematical reasoning: text-based numerical reasoning and automated geometry maths problem solving. Current methods face challenges in addressing complex mathematical reasoning tasks, evident in the lengthy and diverse solutions required. Additionally, in solving geometry maths problems, there is a noticeable deficiency in models’ abilities to accurately interpret geometric relationships from diagrams, which compromises their effectiveness. Furthermore, the advent of large language models (LLMs) and multi-modal models (MMs) underscores the need for a standardised benchmark to evaluate these models’ abilities in geometry problem-solving. To address these issues, we introduce the ELASTIC model in this thesis, designed for text-based numerical reasoning task. ELASTIC uniquely separates the generation of operators and operands to minimise errors from complex reasoning chains and is versatile enough to accommodate a varying number of operands per operator. This makes it broadly applicable across different domains. Our experimental results show ELASTIC’s superior performance, significantly outperforming prior models. Furthermore, we extend the application of the ELASTIC model to tackle geometry maths problems, which are inherently more complex due to the inclusion of geometric diagrams and a broader variety of problem types. To navigate these complexities, we propose the Geometry-Aware Problem Solver (GAPS), a model specifically crafted to solve diverse types of geometric maths problems by generating tailored solution programs. Our experiments validate GAPS’s advancement over existing methods. However, we observed that direct vector representation of geometric diagrams fails to capture the complex geometric relationships, which are critical in solving geometry maths problems. To overcome this, we propose converting geometric relationships into natural language, integrating them with the textual problem descriptions. This method not only improves the interpretability and effectiveness of the models but also allows for the utilisation of LLMs in generating reasoning programs. Lastly, despite the impressive capabilities of recent LLMs and MMs, their proficiency in solving geometry problems, requiring an integrated understanding of textual and visual information, remains unexplored. To fill this gap, we introduce the GeoEval benchmark in this thesis. Through extensive evaluation with GeoEval, we provide a comprehensive quantitative evaluation of the latest LLMs and MMs in geometry problem-solving task. This research marks a significant step forward in assessing the capabilities of state-of-the-art AI models in the realm of geometry problem-solving task

    Report 2011

    No full text

    Mathematical conjecturing and proving

    Get PDF
    Most university courses in mathematics programs are characterized by a strong focus on the axiomatic nature of mathematics, and thus also on proof as the central scientific method of mathematics (Selden, A. & Selden, 2008). Lecturers write proofs on the blackboard, students attempt to demonstrate their understanding and skills by proving theorems on their own or in collaboration with others. However, there is often little systematic discussion in these courses on how new mathematical conjectures can be generated and on how proofs are constructed (Alcock, 2010). Students’ experiences with conjecturing and proving in schools or in university mathematics courses often lead them to “consider proof as a static product rather than a negotiated process that can help students justify and make sense of mathematical ideas” (Otten, Bleiler-Baxter, & Engledowl, 2017, p. 112). Yet, several authors (e.g., Epp, 2003; Savic, 2015a; Selden, A. & Selden, 2008) have hypothesized that often only little time can be devoted to illustrate students which strategies and processes may help to step through the proof construction process and to recover from proving impasses. Furthermore, the knowledge about what characterizes proof processes that lead to a successful outcome (i.e., an acceptable mathematical proof [according to local acceptance criteria]) is rare. To approach this issue, an extensive systematic literature search was conducted to summarize common claims and empirical findings about promising conjecturing and proving processes. 126 articles that focussed on conjecturing and proving were clustered using a topic modeling method. The algorithm identified 17 different topics. The most representative papers for each topic, in total 45 papers, were qualitatively analysed with regard to their research perspectives on which they were based and their claims and findings about the processes that are needed to successfully generate conjectures and construct proofs. This combination of statistical clustering and qualitative analyses allowed a systematic categorization of claims and empirical findings about successful conjecturing and proving processes in the literature. Based on this review, a set of characteristics of conjecturing and proving processes, that are assumed or reported to be crucial for success, is proposed. For the further analysis of such process characteristics, we started from a model differentiating students’ prerequisites they bring to bear on the proving situation, the conjecturing and proving processes they engage in, and the quality of the resulting product. The main question of the empirical work in this dissertation was, which process characteristics influence the quality of the final product (the formulated conjecture and constructed proof), and in which way they mediate the impact of students’ prerequisites on this product. Specifically, we distinguished between individual-mathematical and social-discursive process characteristics of conjecturing and proving. These process characteristics were extracted from prior research in mathematics education or in educational psychology or in the Learning Sciences. The central aim of this dissertation was to develop an instrument for assessing (prospective undergraduate) mathematics students’ conjecturing and proving processes in collaborative situations. A high-inference rating scheme with seven scales, based on theoretical considerations and on rating guidelines adapted from educational research was designed. The rating scheme was evaluated in a study with N=98 prospective undergraduate students working in dyads on an open-ended conjecturing and proving task. The results of the empirical study with regard to the basic analyses showed that collaborative conjecturing and proving processes could be rated with sufficient reliability and that the structure of the data corresponded to the underlying theoretical assumption that two dimensions, one related to individual-mathematical and one related to social-discursive process characteristics can be distinguished. The in-depth analyses pointed out that individual-mathematical process characteristics were predictive for the quality of the resulting product and mediated the relation between prerequisites (students’ prior knowledge on proof) and the quality of the product. In this way, the dissertation contributes to the scientific debate on how to assess (mathematical argumentation) skills (e.g., Blömeke, Gustafsson, & Shavelson, 2015; Koeppen, Hartig, Klieme, & Leutner, 2008) and provides theoretical and empirical insights on individual-mathematical and social-discursive process characteristics that describe the quality of collaborative conjecturing and proving processes

    Q(sqrt(-3))-Integral Points on a Mordell Curve

    Get PDF
    We use an extension of quadratic Chabauty to number fields,recently developed by the author with Balakrishnan, Besser and M ̈uller,combined with a sieving technique, to determine the integral points overQ(√−3) on the Mordell curve y2 = x3 − 4

    Formalization of various geometry models and applications in verification of automated theorem provers

    Get PDF
    У овој тези представљена је интерактивна формализација модела разних геометрија и алгебарских метода аутоматског доказивања геометријских те- орема...In this thesis is presented interactive formalization of various models of geometry and algebraic methods for automated proving geometry theorems...
    corecore