Improving Automatic Content Type Identification from a Data Set

Dai, Kathy T

research

Improving Automatic Content Type Identification from a Data Set

Authors: Kathy T Dai
Publication date: 1 May 2017
Publisher: ScholarWorks@UARK

Abstract

Data file layout inference refers to building the structure and determining the metadata of a text file. The text files dealt within this research are personal information records that have a consistent structure. Traditionally, if the layout structure of a text file is unknown, the human user must undergo manual labor of identifying the metadata. This is inefficient and prone to error. Content-based oracles are the current state-of-the-art automation technology that attempts to solve the layout inference problem by using databases of known metadata. This paper builds upon the information and documentation of the content-based oracles, and improves the databases of the oracles through experimentation

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UARK (University of Arkansas )

oai:scholarworks.uark.edu:csce...

Last time updated on 18/08/2023

ScholarWorks@UARK

oai:scholarworks.uark.edu:csce...

Last time updated on 05/07/2017