We propose a new technique to infer the structure and extract the tokens of
data from the semi-structured web sources which are generated using a
consistent template or layout with some implicit regularities. The attributes
are extracted and labeled reversely from the region of interest of targeted
contents. This is in contrast with the existing techniques which always
generate the trees from the root. We argue and show that our technique is
simpler, more accurate and effective especially to detect the changes of the
templates of targeted web pages.Comment: 5 pages, Proceeding of the 2009 International Conference on Signal
Processing Systems pp. 551-55