547 research outputs found
Pattern discovery for semi-structured web pages using bar-tree representation
Many websites with an underlying database containing structured data provide
the richest and most dense source of information relevant for topical data
integration. The real data integration requires sustainable and reliable
pattern discovery to enable accurate content retrieval and to recognize pattern
changes from time to time; yet, extracting the structured data from web
documents is still lacking from its accuracy. This paper proposes the bar-tree
representation to describe the whole pattern of web pages in an efficient way
based on the reverse algorithm. While previous algorithms always trace the
pattern and extract the region of interest from \textit{top root}, the reverse
algorithm recognizes the pattern from the region of interest to both top and
bottom roots simultaneously. The attributes are then extracted and labeled
reversely from the region of interest of targeted contents. Since using
conventional representations for the algorithm should require more
computational power, the bar-tree method is developed to represent the
generated patterns using bar graphs characterized by the depths and widths from
the document roots. We show that this representation is suitable for extracting
the data from the semi-structured web sources, and for detecting the template
changes of targeted pages. The experimental results show perfect recognition
rate for template changes in several web targets.Comment: 9 page
Reverse method for labeling the information from semi-structured web pages
We propose a new technique to infer the structure and extract the tokens of
data from the semi-structured web sources which are generated using a
consistent template or layout with some implicit regularities. The attributes
are extracted and labeled reversely from the region of interest of targeted
contents. This is in contrast with the existing techniques which always
generate the trees from the root. We argue and show that our technique is
simpler, more accurate and effective especially to detect the changes of the
templates of targeted web pages.Comment: 5 pages, Proceeding of the 2009 International Conference on Signal
Processing Systems pp. 551-55
GRID Architecture through a Public Cluster
An architecture to enable some blocks consisting of several nodes in a public
cluster connected to different grid collaborations is introduced. It is
realized by inserting a web-service in addition to the standard Globus Toolkit.
The new web-service performs two main tasks : authenticate the digital
certificate contained in an incoming requests and forward it to the designated
block. The appropriate block is mapped with the username of the block's owner
contained in the digital certificate. It is argued that this algorithm opens an
opportunity for any blocks in a public cluster to join various global grids.Comment: 5 pages, Proceeding of the 2008 International Conference on Computer
and Communication Engineerin
Field theory approach in the dynamics of biomatter
A new approach to model the biomatter dynamics based on the field theory is
presented. It is shown that some well known tools in field theory can be
utilized to describe the physical phenomena in life matters, in particular at
elementary biomatters like DNA and proteins. In this approach, the biomatter
dynamics are represented as results of interactions among its elementary
matters in the form of lagrangian. Starting from the lagrangian would provide
stronger underlying theoretical consideration for further extension. Moreover,
it also enables us to acquire rich physical observables using statistical
mechanics instead of relying on the space-time dynamics from certain equation
of motions which is not solvable due to its nonlinearities. Few examples from
previous results are given and explained briefly.Comment: 7 page
- …
