Search CORE

322,259 research outputs found

Code generator for integrating warehouse XML data sources.

Author: Liu Chunsheng.
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2002
Field of study

XML---the extensible Markup Language, has been recognized as the standard for data representation and exchange on the world wide web. Vast amounts of XML data are available on the web. Since the information on the web is stored on separate web pages, it is very hard to combine pieces of information for decision support purposes. Data warehouse data integration provides a solution for integrating the different XML source data into a unique format with meaningful information for decision support systems. A data warehouse is a large integrated database organized around major subjects of an enterprise for the purpose of decision support querying. Many enterprises are creating their own data warehouse systems from scratch in different varying formats, making the issue of building a more efficient, more reliable, cost-effective and easy-to-use data warehouse system important. Building a code generator for creating a program that automatically integrates XML data sources into a target data warehouse is one solution. There is little research showing the use of the newest XML techniques in code generator for data warehouse XML data integration. This thesis proposes a Warehouse Integrator code generator for XML (WIG4X), which integrates XML data sources into a target data warehouse by first generating Java programs for data extracting, cleaning and loading XML data into the data warehouse. WIG4X system also generates the programs for creating XML views from the data warehouse. XML schema mapping strategy is employed for structural integration of each XML data source to data warehouse using a first order logic-like-language similar to that used in INFOMASTER. The content integration is handled through XML data extraction, conversion constraints, data cleaning and data loading. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2001 .L57. Source: Masters Abstracts International, Volume: 40-06, page: 1549. Adviser: Christie Ezeife. Thesis (M.Sc.)--University of Windsor (Canada), 2002

Scholarship at UWindsor

DATA CURATION FOR MODELING TALL FESCUE BIOMASS DYNAMICS WITH DSSAT-CSM

Author: Alderman P. D.
Butler T. J.
Caldeira Rocateli A.
Hanson M. B.
Publication venue: UKnowledge
Publication date: 01/01/2023
Field of study

While models for predicting forage production are available to aid management decisions for some forage crops, there is limited research for a yield model designed specifically for tall fescue (Schedonorus arundinaceus). Therefore, our objective was to adapt an existing perennial forage model, the Decision Support System for Agrotechnology Transfer Cropping Systems Model (DSSAT-CSM) for predicting forage biomass of tall fescue in the southern Great Plains. To evaluate model performance, there must first be a high level of data manipulation and cleaning. In this project, a cohesive dataset combining biomass, weather, soil, and management data were structured into DSSAT standard file format to be used in future tall fescue crop modeling analysis

University of Kentucky

A Comparison of Decision Tree with Logistic Regression Model for Prediction of Worst Non-Financial Payment Status in Commercial Credit

Author: Priestley Jennifer L.
Rudd Jessica M., MPH, GStat
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 01/01/2017
Field of study

Credit risk prediction is an important problem in the financial services domain. While machine learning techniques such as Support Vector Machines and Neural Networks have been used for improved predictive modeling, the outcomes of such models are not readily explainable and, therefore, difficult to apply within financial regulations. In contrast, Decision Trees are easy to explain, and provide an easy to interpret visualization of model decisions. The aim of this paper is to predict worst non-financial payment status among businesses, and evaluate decision tree model performance against traditional Logistic Regression model for this task. The dataset for analysis is provided by Equifax and includes over 300 potential predictors from more than 11 million unique businesses. After a data discovery phase, including imputation, cleaning, and transforming potential predictors, Decision Tree and Logistic Regression models were built on the same finalized analysis dataset. Evaluating the models based on ROC index, and Kolmogorov-Smirnov statistic, Decision Tree performed as well as the Logistic Regression model

DigitalCommons@Kennesaw State University

Comparison of Classification Algorithm for Crop Decision based on Environmental Factors using Machine Learning

Author: Deshpande Guruprasad
Khandagle Vishal
Kolhe Jayesh
P. K. Rajani
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

Crop decision is a very complex process. In Agriculture it plays a vital role. Various biotic and abiotic factors affect this decision. Some crucial Environmental factors are Nitrogen Phosphorus, Potassium, pH, Temperature, Humidity, Rainfall. Machine Learning Algorithm can perfectly predict the crop necessary for this environmental condition. Various algorithms and model are used for this process such as feature selection, data cleaning, Training, and testing split etc. Algorithms such as Logistic regression, Decision Tree, Support vector machine, K- Nearest Neighbour, Navies Bayes, Random Forest. A comparison based on the accuracy parameter is presented in this paper along with various training and testing split for optimal choice of best algorithm. This comparison is done on two tools i.e., on Google collab using python and its libraries for implementation of Machine Learning Algorithm and WEKA which is a pre-processing tool to compare various algorithm of machine learning

International Journal on Recent and Innovation Trends in Computing and Communication

Declarative Data Cleaning : Language, Model, and Algorithms

Author: Florescu Daniela
Galhardas Helena
Saita Cristian
Shasha Dennis
Simon Eric
Publication venue: HAL CCSD
Publication date: 01/01/2001
Field of study

Projet CARAVELThe problem of data cleaning, which consists of emoving inconsistencies and errors from original data sets, is well known in the area of decision support systems and data warehouses. However, for non-conventional applications, such as the migration of largely unstructured data into structured one, or the integration of heterogeneous scientific data sets in inter-discipl- inary fields (e.g., in environmental science), existing ETL (Extraction Transformation Loading) and data cleaning tools for writing data cleaning programs are insufficient. The main challenge with them is the design of a data flow graph that effectively generates clean data, and can perform efficiently on large sets of input data. The difficulty with them comes from (i) a lack of clear separation between the logical specification of data transformations and their physical implementation and (ii) the lack of explanation of cleaning results and user interaction facilities to tune a data cleaning program. This paper addresses these two problems and presents a language, an execution model and algorithms that enable users to express data cleaning specifications declaratively and perform the cleaning efficiently. We use as an example a set of bibliographic references used to construct the Citeseer Web site. The underlying data integration problem is to derive structured and clean textual records so that meaningful queries can be performed. Experimental results report on the assessement of the proposed framework for data cleaning

INRIA a CCSD electronic archive server

HAL Descartes

XXXVIII Jornadas de Automática

Author: Gómez Palacín Carlos
Jasch Christian
Kalliski Marc
Pitarch Pérez José Luis
Prada Moraga César de
Publication venue: 'Fundacion Universidad de Oviedo'
Publication date: 01/01/2017
Field of study

Producción CientíficaThis work presents a decision-support tool to address the model-based optimization approach for online load allocation and scheduling of cleaning operations in an evaporation network. The aim is improving the resource efficiency by supplying the optimal solution for a given production goal. The approach includes the semi-automatic update of evaporator models, which is based on historical data for minimal modelling effort. The structure of the problem is formulated via mixed-integer programming and integrated into the plant supervision systems. Production constraints, concerns about the practical implementation and visualization preferences are also taken into account in the design of the prototypical tool.MINECO/FEDER Grant DPI2015-70975 (INOPTCON)EU H2020-SPIRE Grant Agreement nº 723575 (CoPro

Repositorio Documental de la Universidad de Valladolid

QUALITATIVE CLEANING METHODS ON DISTRIBUTED IOT DATASETS

Author: Ogungbemile George
Publication venue
Publication date: 15/05/2019
Field of study

Data analysis encompasses a set of individual steps that allows a typically large data set to be remodeled such that actionable information can be extracted from the data set, which can then be used to support decision-making. Data generated from multiple distributed sources is usually dirty by default and dirty data will often lead to inaccurate or incomplete data analysis. As a result, without first performing data cleaning, wrong or fatally flawed business decisions is inevitable. IoT describes a network of physical and virtual objects containing software, electrical components and sensors that exchange data with other connected devices over the internet. The data generated from these sensors is distributed by design and my aim for this thesis is to explore qualitative data cleaning methods such as integrity constraints and functional dependency violations to perform error detection and in place error repairing techniques on the distributed data set generated from these devices. This approach is relatively new since most of the prior data cleaning research in this domain have focused on quantitative techniques such as outlier detection. The next goal for my thesis will then be to perform exploratory data analysis on the data sets from these IoT sources using data wrangling tools on open source frameworks such as Optimus under Apache Spark to handle the unstructured and semi structured formats of the data generated from these sources. The end goal will be to generate clean data from these data sources such that insights can be gained to support decision making for the purpose of product improvement

Scholarly Works @ SHSU (Sam Houston State University)

The Role of Metadata for Effective Data Warehouse

Author: Alaa Abdulqahar Jihad
Murtadha M. Hamad
Publication venue: University of Anbar
Publication date: 01/12/2012
Field of study

Metadata efficient method for managing Data Warehouse (DW). It is also an effective tool in reducing the time or speed to answer queries. In addition, it achieved capabilities of the integration and standardization, thus lead to faster, clear and accurate decision-making in the right time. This paper provides the definition of metadata concept, and using metadata in Data Cleaning; which it identify the sources, types of fields, and choose the appropriate algorithm. In addition, useful in Decision Support System (DSS); which it improve efficiency of analysis and reduces response time of quer

Directory of Open Access Journals

PERANAN LINGKUNGAN SEKOLAH TERHADAP PENGUATAN KARAKTER PEDULI LINGKUNGAN SISWA SMAN 111 JAKARTA

Author: Anastya Zalfa Anastya Zalfa
Fadhil Abdul
Shobihah Alya
Publication venue: 'Tanjungpura University'
Publication date: 01/10/2022
Field of study

This study aims to describe the role of the school environment in strengthening the environmental care character of students at SMA Negeri 111 Jakarta. This type of research is a qualitative approach with descriptive research methods. Data collection techniques in this study used interviews, observation, and documentation. The validity of the data in this study uses triangulation of sources and techniques. Data analysis used the interactive model of Miles and Huberman with three stages, namely data reduction, data display, decision making and verification which were presented in a qualitative descriptive manner. The results of this study prove that the role of the school environment in strengthening the character of caring for the environment is manifested in several ways including, (1) providing exemplary habituation, (2) habituation of maintaining cleanliness and environmental sustainability, (3) the availability of supporting facilities including the provision of cleaning equipment, garbage disposal sites. , toilets and clean water as well as slogans or posters caring for the environment in various corners of the school, and (4) support the Adiwiyata program by holding clean Fridays, a waste bank and making compost

Jurnal Pendidikan Sosiologi dan Humaniora

Will We Connect Again? Machine Learning for Link Prediction in Mobile Social Networks

Author: Andrew Chen
Brian Tran
Ole J Mengshoel
Raj Desai
Publication venue
Publication date: 05/03/2020
Field of study

ABSTRACT In this paper we examine link prediction for two types of data sets with mobility data, namely call data records (from the MIT Reality Mining project) and location-based social networking data (from the companies Gowalla and Brightkite). These data sets contain location information, which we incorporate in the features used for prediction. We also examine different strategies for data cleaning, in particular thresholding based on the amount of social interaction. We investigate the machine learning algorithms Decision Tree, Naïve Bayes, Support Vector Machine, and Logistic Regression. Generally, we find that our feature selection and filtering of the data sets have a major impact on the accuracy of link prediction, both for Reality Mining and Gowalla. Experimentally, the Decision Tree and Logistic Regression classifiers performed best

CiteSeerX