48 research outputs found

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework

    Full text link
    This technical report presents AutoGen, a new framework that enables development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools. AutoGen's design offers multiple advantages: a) it gracefully navigates the strong but imperfect generation and reasoning abilities of these LLMs; b) it leverages human understanding and intelligence, while providing valuable automation through conversations between agents; c) it simplifies and unifies the implementation of complex LLM workflows as automated agent chats. We provide many diverse examples of how developers can easily use AutoGen to effectively solve tasks or build applications, ranging from coding, mathematics, operations research, entertainment, online decision-making, question answering, etc.Comment: 28 page

    An Empirical Study on Challenging Math Problem Solving with GPT-4

    Full text link
    Employing Large Language Models (LLMs) to address mathematical problems is an intriguing research endeavor, considering the abundance of math problems expressed in natural language across numerous science and engineering fields. While several prior works have investigated solving elementary mathematics using LLMs, this work explores the frontier of using GPT-4 for solving more complex and challenging math problems. We evaluate various ways of using GPT-4. Some of them are adapted from existing work, and one is \MathChat, a conversational problem-solving framework newly proposed in this work. We perform the evaluation on difficult high school competition problems from the MATH dataset, which shows the advantage of the proposed conversational approach

    Automatic Data Transformation Using Large Language Model: An Experimental Study on Building Energy Data

    Full text link
    Existing approaches to automatic data transformation are insufficient to meet the requirements in many real-world scenarios, such as the building sector. First, there is no convenient interface for domain experts to provide domain knowledge easily. Second, they require significant training data collection overheads. Third, the accuracy suffers from complicated schema changes. To bridge this gap, we present a novel approach that leverages the unique capabilities of large language models (LLMs) in coding, complex reasoning, and zero-shot learning to generate SQL code that transforms the source datasets into the target datasets. We demonstrate the viability of this approach by designing an LLM-based framework, termed SQLMorpher, which comprises a prompt generator that integrates the initial prompt with optional domain knowledge and historical patterns in external databases. It also implements an iterative prompt optimization mechanism that automatically improves the prompt based on flaw detection. The key contributions of this work include (1) pioneering an end-to-end LLM-based solution for data transformation, (2) developing a benchmark dataset of 105 real-world building energy data transformation problems, and (3) conducting an extensive empirical evaluation where our approach achieved 96% accuracy in all 105 problems. SQLMorpher demonstrates the effectiveness of utilizing LLMs in complex, domain-specific challenges, highlighting the potential of their potential to drive sustainable solutions.Comment: 10 pages, 7 figure

    Extracting N-ary Facts from Wikipedia Table Clusters

    Get PDF
    Tables in Wikipedia articles contain a wealth of knowledge that would be useful for many applications if it were structured in a more coherent, queryable form. An important problem is that many of such tables contain the same type of knowledge, but have different layouts and/or schemata. Moreover, some tables refer to entities that we can link to Knowledge Bases (KBs), while others do not. Finally, some tables express entity-attribute relations, while others contain more complex n-ary relations. We propose a novel knowledge extraction technique that tackles these problems. Our method first transforms and clusters similar tables into fewer unified ones to overcome the problem of table diversity. Then, the unified tables are linked to the KB so that knowledge about popular entities propagates to the unpopular ones. Finally, our method applies a technique that relies on functional dependencies to judiciously interpret the table and extract n-ary relations. Our experiments over 1.5M Wikipedia tables show that our clustering can group many semantically similar tables. This leads to the extraction of many novel n-ary relations

    AI is a viable alternative to high throughput screening: a 318-target study

    Get PDF
    : High throughput screening (HTS) is routinely used to identify bioactive small molecules. This requires physical compounds, which limits coverage of accessible chemical space. Computational approaches combined with vast on-demand chemical libraries can access far greater chemical space, provided that the predictive accuracy is sufficient to identify useful molecules. Through the largest and most diverse virtual HTS campaign reported to date, comprising 318 individual projects, we demonstrate that our AtomNetĀ® convolutional neural network successfully finds novel hits across every major therapeutic area and protein class. We address historical limitations of computational screening by demonstrating success for target proteins without known binders, high-quality X-ray crystal structures, or manual cherry-picking of compounds. We show that the molecules selected by the AtomNetĀ® model are novel drug-like scaffolds rather than minor modifications to known bioactive compounds. Our empirical results suggest that computational methods can substantially replace HTS as the first step of small-molecule drug discovery

    Search and Join Algorithms for Tables in Data Lakes

    No full text
    Data lakes are repositories of data sets stored in their raw formats. Data lakes can be dumping grounds if users cannot find and utilize the data in them. In this thesis, we describe two problems in managing data lakes: searching for tables that can be joined, and auto-generating syntactic transformations for joining tables with join values of different formats. Given a query table and a join column, the first problem is defined as searching for tables that can be joined with the query table on the join column. Our contributions toward solving this problem are twofold: 1) an approximate search index (based on locality sensitive hashing) to support threshold-based search queries -- find tables that can join with more than a threshold percentage of the distinct join values in the query table; and 2) an exact search index that supports top-k search queries -- find the best k tables that cover the largest number of distinct join values. Both approaches use new data-aware optimizations to provide interactive query performance over real data lakes with millions of tables including many large tables (e.g., millions of rows). Ours is the first approach for searching for joinable tables and we show that it greatly outperforms previous approaches for computing set intersection (used for keyword search and other applications). We also published open source implementations of the joinable table search algorithms and benchmarks created using real data lakes. For the second problem, we propose a technique that generates transformations, without human input, for joining tables with different formats on the join columns. The technique uses a novel approach to pinpoint highly promising joinable row pairs, before using the pairs as input/output examples to perform a greedy search to find a good transformation. The technique scales to tables as large as 10K rows while still maintaining interactive speed. The solutions presented in this thesis make data lakes more searchable and usable, and allow data scientists to be efficient. These experimentally-validated solutions also create an avenue for new data science discoveries that are important in business and government decision-making.Ph.D
    corecore