1,791 research outputs found

    Automatic Data Transformation Using Large Language Model: An Experimental Study on Building Energy Data

    Full text link
    Existing approaches to automatic data transformation are insufficient to meet the requirements in many real-world scenarios, such as the building sector. First, there is no convenient interface for domain experts to provide domain knowledge easily. Second, they require significant training data collection overheads. Third, the accuracy suffers from complicated schema changes. To bridge this gap, we present a novel approach that leverages the unique capabilities of large language models (LLMs) in coding, complex reasoning, and zero-shot learning to generate SQL code that transforms the source datasets into the target datasets. We demonstrate the viability of this approach by designing an LLM-based framework, termed SQLMorpher, which comprises a prompt generator that integrates the initial prompt with optional domain knowledge and historical patterns in external databases. It also implements an iterative prompt optimization mechanism that automatically improves the prompt based on flaw detection. The key contributions of this work include (1) pioneering an end-to-end LLM-based solution for data transformation, (2) developing a benchmark dataset of 105 real-world building energy data transformation problems, and (3) conducting an extensive empirical evaluation where our approach achieved 96% accuracy in all 105 problems. SQLMorpher demonstrates the effectiveness of utilizing LLMs in complex, domain-specific challenges, highlighting the potential of their potential to drive sustainable solutions.Comment: 10 pages, 7 figure

    Mining sequences in distributed sensors data for energy production.

    Get PDF
    Brief Overview of the Problem: The Environmental Protection Agency (EPA), a government funded agency, provides both legislative and judicial powers for emissions monitoring in the United States. The agency crafts laws based on self-made regulations to enforce companies to operate within the limits of the law resulting in environmentally safe operation. Specifically, power companies operate electric generating facilities under guidelines drawn-up and enforced by the EPA. Acid rain and other harmful factors require that electric generating facilities report hourly emissions recorded via a Supervisory Control and Data Acquisition (SCADA) system. SCADA is a control and reporting system that is present in all power plants consisting of sensors and control mechanisms that monitor all equipment within the plants. The data recorded by a SCADA system is collected by the EPA and allows them to enforce proper plant operation relating to emissions. This data includes a lot of generating unit and power plant specific details, including hourly generation. This hourly generation (termed grossunitload by the EPA) is the actual hourly average output of the generator on a per unit basis. The questions to be answered are do any of these units operate in tandem and do any of the units start, stop, or change operation as a result of another\u27s change in generation? These types of questions will be answered for the years of April 2002 through April 2003 for facilities that operate pipeline natural-gas-fired generating units. Purpose of Research The research conducted has dual uses if fruitful. First, the use of a local modeling between generating units would be highly profitable among energy traders. Betting that a plant will operate a unit based on another\u27s current characteristics would be sensationally profitable to energy traders. This profitability is variable due to fuel type. For instance, if the price of coal is extremely high due to shortages, the value of knowing a semioperating characteristic of two generating units is highly valuable. Second, this known characteristic can also be used in regulation and operational modeling. The second use is of great importance to government agencies. If regulatory committees can be aware of past (or current) similarities between power producers, they may be able to avoid a power struggle in a region caused by greedy traders or companies. Not considering profitable motives, the Department of Energy may use something similar to generate a model of power grid generation availability based on previous data for reliability purposes. Type of Problem: The problem tackled within this Master\u27s thesis is of multiple time series pattern recognition. This field is expansive and well studied, therefore the research performed will benefit from previously known techniques. The author has chosen to experiment with conventional techniques such as correlation, principal component analysis, and kmeans clustering for feature and eventually pattern extraction. For the primary analysis performed, the author chose to use a conventional sequence discovery algorithm. The sequence discovery algorithm has no prior knowledge of space limitations, therefore it searches over the entire space resulting in an expense but complete process. Prior to sequence discovery the author applies a uniform coding schema to the raw data, which is an adaption of a coding schema presented by Keogh. This coding and discovery process is deemed USD, or Uniform Sequence Discovery. The data is highly dimensional along with being extremely dynamic and sporadic with regards to magnitude. The energy market that demands power generation is profit and somewhat reliability driven. The obvious factors are more reliability based, for instance to keep system frequency at 60Hz, units may operate in an idle state resulting in a constant or very low value for a period of time (idle time). Also to avoid large frequency swings on the power grid, companies are require

    Performance Evaluation - Annual Report Year 2

    Get PDF
    In this paper a performance measuring infrastructure,developed for the prototype and simulator, concering the experiment configuration, data measurement, and data collection, is presented. A corresponding performance evaluation framework is defined to obtain the metrics from the measured data. Initial experiments were carried out to test the developed prototype, simulator and the performance measuring infrastructure. --Grid Computing

    User-centric Visualization of Data Provenance

    Get PDF
    The need to understand and track files (and inherently, data) in cloud computing systems is in high demand. Over the past years, the use of logs and data representation using graphs have become the main method for tracking and relating information to the cloud users. While it is still in use, tracking and relating information with ‘Data Provenance’ (i.e. series of chronicles and the derivation history of data on meta-data) is the new trend for cloud users. However, there is still much room for improving representation of data activities in cloud systems for end-users. In this thesis, we propose “UVisP (User-centric Visualization of Data Provenance with Gestalt)”, a novel user-centric visualization technique for data provenance. This technique aims to facilitate the missing link between data movements in cloud computing environments and the end-users’ uncertain queries over their files’ security and life cycle within cloud systems. The proof of concept for the UVisP technique integrates D3 (an open-source visualization API) with Gestalts’ theory of perception to provide a range of user-centric visualizations. UVisP allows users to transform and visualize provenance (logs) with implicit prior knowledge of ‘Gestalts’ theory of perception.’ We presented the initial development of the UVisP technique and our results show that the integration of Gestalt and the existence of ‘perceptual key(s)’ in provenance visualization allows end-users to enhance their visualizing capabilities, extract useful knowledge and understand the visualizations better. This technique also enables end-users to develop certain methods and preferences when sighting different visualizations. For example, having the prior knowledge of Gestalt’s theory of perception and integrated with the types of visualizations offers the user-centric experience when using different visualizations. We also present significant future work that will help profile new user-centric visualizations for cloud users

    A Mobile ECG Monitoring System with Context Collection

    Get PDF
    An objective of a health process is one where patients can stay healthy with the support of expert medical advice when they need it, at any location and any time. An associated aim would be the development of a system which places increased emphasis on preventative measures as a first point of contact with the patient. This research is a step along the road towards this type of preventative healthcare for cardiac patients. It seeks to develop a smart mobile ECG monitoring system that requests and records context information about what is happening around the subject when an arrhythmia event occurs. Context information about the subject’s activities of daily living will, it is hoped, provide an enriched data set for clinicians and so improve clinical decision making. As a first step towards a mobile cardiac wellness guidelines system, the focus of this work is to develop a system that can receive bio-signals wirelessly, analyzing and storing the bio-signal in a handheld device and can collect context information when there are significant changes in bio-signs. For this purpose the author will use a low cost development environment to program a state of the art wireless prototype on a handheld computer that detects and responds to changes in the heart rate as calculated form the interval between successive heart beats. Although the general approach take in this work could be applied to a wide range of bio-signals, the research will focus on ECG signals. The pieces of the system are, A wireless receiver, data collection and storage module An efficient real time ECG beat detection algorithm A rule based (Event-Condition-Action) interactive system A simple user interface, which can request additional information form the user. A selection of real-time ECG detection algorithms have been investigated and one algorithm was implemented in MATLAB [110] and then in Java [142] for this project. In order to collect ECG signals (and in principle any signals) the generalised data collection architecture has also been developed utilizing Java [142] and Bluetooth [5] technology. This architecture uses an implementation of the abstract factory pattern [91] to ensure that the communication channel can be changed conveniently. Another core part of this project is a “wellness” guideline based on Event-Condition-Action (E-C-A) [68] production rule approach that originated in active databases. The work also focuses on design of a guideline based expert system which an E-C-A based implementation will be fully event driven using the Java programming language. Based on the author’s experience and the literature review, some important issues in mobile healthcare along with the corresponding reasons, consequences and possible solutions will be presented

    Veebi otsingumootorid ja vajadus keeruka informatsiooni jÀrele

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsioone.Veebi otsingumootorid on muutunud pĂ”hiliseks teabe hankimise vahenditeks internetist. Koos otsingumootorite kasvava populaarsusega on nende kasutusala kasvanud lihtsailt pĂ€ringuilt vajaduseni kĂŒllaltki keeruka informatsiooni otsingu jĂ€rele. Samas on ka akadeemiline huvi otsingu vastu hakanud liikuma lihtpĂ€ringute analĂŒĂŒsilt mĂ€rksa keerukamate tegevuste suunas, mis hĂ”lmavad ka pikemaid ajaraame. Praegused otsinguvahendid ei toeta selliseid tegevusi niivĂ”rd hĂ€sti nagu lihtpĂ€ringute juhtu. Eriti kehtib see toe osas koondada mitme pĂ€ringu tulemusi kokku sĂŒnteesides erinevate lihtotsingute tulemusi ĂŒhte uude dokumenti. Selline lĂ€henemine on alles algfaasis ja ning motiveerib uurijaid arendama vastavaid vahendeid toetamaks taolisi informatsiooniotsingu ĂŒlesandeid. KĂ€esolevas dissertatsioonis esitatakse rida uurimistulemusi eesmĂ€rgiga muuta keeruliste otsingute tuge paremaks kasutades tĂ€napĂ€evaseid otsingumootoreid. AlameesmĂ€rkideks olid: (a) arendada vĂ€lja keeruliste otsingute mudel, (b) mÔÔdikute loomine kompleksotsingute mudelile, (c) eristada kompleksotsingu ĂŒlesandeid lihtotsingutest ning teha kindlaks, kas neid on vĂ”imalik mÔÔta leides ĂŒhtlasi lihtsaid mÔÔdikuid kirjeldamaks nende keerukust, (d) analĂŒĂŒsida, kui erinevalt kasutajad kĂ€ituvad sooritades keerukaid otsinguĂŒlesandeid kasutades veebi otsingumootoreid, (e) uurida korrelatsiooni inimeste tava-veebikasutustavade ja nende otsingutulemuslikkuse vahel, (f) kuidas inimestel lĂ€heb eelhinnates otsinguĂŒlesande raskusastet ja vajaminevat jĂ”upingutust ning (g) milline on soo ja vanuse mĂ”ju otsingu tulemuslikkusele. Keeruka veebiotsingu ĂŒlesanded jaotatakse edukalt kolmeastmeliseks protsessiks. Esitatakse sellise protsessi mudel; seda protsessi on ĂŒhtlasi vĂ”imalik ka mÔÔta. Edasi nĂ€idatakse kompleksotsingu loomupĂ€raseid omadusi, mis teevad selle eristatavaks lihtsamatest juhtudest ning nĂ€idatakse Ă€ra katsemeetod sooritamaks kompleksotsingu kasutaja-uuringuid. Demonstreeritakse pĂ”hilisi samme raamistiku “Search-Logger” (eelmainitud metodoloogia tehnilise teostuse) rakendamisel kasutaja-uuringutes. Esitatakse sellisel viisil teostatud uuringute tulemused. LĂ”puks esitatakse ATMS meetodi realisatsioon ja rakendamine parandamaks kompleksotsingu vajaduste tuge kaasaegsetes otsingumootorites.Search engines have become the means for searching information on the Internet. Along with the increasing popularity of these search tools, the areas of their application have grown from simple look-up to rather complex information needs. Also the academic interest in search has started to shift from analyzing simple query and response patterns to examining more sophisticated activities covering longer time spans. Current search tools do not support those activities as well as they do in the case of simple look-up tasks. Especially the support for aggregating search results from multiple search-queries, taking into account discoveries made and synthesizing them into a newly compiled document is only at the beginning and motivates researchers to develop new tools for supporting those information seeking tasks. In this dissertation I present the results of empirical research with the focus on evaluating search engines and developing a theoretical model of the complex search process that can be used to better support this special kind of search with existing search tools. It is not the goal of the thesis to implement a new search technology. Therefore performance benchmarks against established systems such as question answering systems are not part of this thesis. I present a model that decomposes complex Web search tasks into a measurable, three-step process. I show the innate characteristics of complex search tasks that make them distinguishable from their less complex counterparts and showcase an experimentation method to carry out complex search related user studies. I demonstrate the main steps taken during the development and implementation of the Search-Logger study framework (the technical manifestation of the aforementioned method) to carry our search user studies. I present the results of user studies carried out with this approach. Finally I present development and application of the ATMS (awareness-task-monitor-share) model to improve the support for complex search needs in current Web search engines

    Data Visualization for the Benchmarking Engine

    Get PDF
    In today\u27s information age, data collection is not the ultimate goal; it is simply the first step in extracting knowledge-rich information to shape future decisions. In this thesis, we present ChartVisio - a simple web-based visual data-mining system that lets users quickly explore databases and transform raw data into processed visuals. It is highly interactive, easy to use and hides the underlying complexity of querying from its users. Data from tables is internally mapped into charts using aggregate functions across tables. The tool thus integrates querying and charting into a single general-purpose application. ChartVisio has been designed as a component of the Benchmark data engine, being developed at the Computer Science department, University of New Orleans. The data engine is an intelligent website generator and users who create websites using the Data Engine are the site owners. Using ChartVisio, owners may generate new charts and save them as XML templates for prospective website surfers. Everyday Internet users may view saved charts with the touch of a button and get real-time data, since charts are generated dynamically. Website surfers may also generate new charts, but may not save them as templates. As a result, even non-technical users can design and generate charts with minimal time and effort

    Data Visualization for the Benchmarking Engine

    Get PDF
    In today\u27s information age, data collection is not the ultimate goal; it is simply the first step in extracting knowledge-rich information to shape future decisions. In this thesis, we present ChartVisio - a simple web-based visual data-mining system that lets users quickly explore databases and transform raw data into processed visuals. It is highly interactive, easy to use and hides the underlying complexity of querying from its users. Data from tables is internally mapped into charts using aggregate functions across tables. The tool thus integrates querying and charting into a single general-purpose application. ChartVisio has been designed as a component of the Benchmark data engine, being developed at the Computer Science department, University of New Orleans. The data engine is an intelligent website generator and users who create websites using the Data Engine are the site owners. Using ChartVisio, owners may generate new charts and save them as XML templates for prospective website surfers. Everyday Internet users may view saved charts with the touch of a button and get real-time data, since charts are generated dynamically. Website surfers may also generate new charts, but may not save them as templates. As a result, even non-technical users can design and generate charts with minimal time and effort
    • 

    corecore