1 research outputs found
On Extracting Data from Tables that are Encoded using HTML
Tables are a common means to display data in human-friendly formats. Many
authors have worked on proposals to extract those data back since this has many
interesting applications. In this article, we summarise and compare many of the
proposals to extract data from tables that are encoded using HTML and have been
published between and . We first present a vocabulary that
homogenises the terminology used in this field; next, we use it to summarise
the proposals; finally, we compare them side by side. Our analysis highlights
several challenges to which no proposal provides a conclusive solution and a
few more that have not been addressed sufficiently; simply put, no proposal
provides a complete solution to the problem, which seems to suggest that this
research field shall keep active in the near future. We have also realised that
there is no consensus regarding the datasets and the methods used to evaluate
the proposals, which hampers comparing the experimental results