547 research outputs found

    Pattern discovery for semi-structured web pages using bar-tree representation

    Full text link
    Many websites with an underlying database containing structured data provide the richest and most dense source of information relevant for topical data integration. The real data integration requires sustainable and reliable pattern discovery to enable accurate content retrieval and to recognize pattern changes from time to time; yet, extracting the structured data from web documents is still lacking from its accuracy. This paper proposes the bar-tree representation to describe the whole pattern of web pages in an efficient way based on the reverse algorithm. While previous algorithms always trace the pattern and extract the region of interest from \textit{top root}, the reverse algorithm recognizes the pattern from the region of interest to both top and bottom roots simultaneously. The attributes are then extracted and labeled reversely from the region of interest of targeted contents. Since using conventional representations for the algorithm should require more computational power, the bar-tree method is developed to represent the generated patterns using bar graphs characterized by the depths and widths from the document roots. We show that this representation is suitable for extracting the data from the semi-structured web sources, and for detecting the template changes of targeted pages. The experimental results show perfect recognition rate for template changes in several web targets.Comment: 9 page

    Reverse method for labeling the information from semi-structured web pages

    Full text link
    We propose a new technique to infer the structure and extract the tokens of data from the semi-structured web sources which are generated using a consistent template or layout with some implicit regularities. The attributes are extracted and labeled reversely from the region of interest of targeted contents. This is in contrast with the existing techniques which always generate the trees from the root. We argue and show that our technique is simpler, more accurate and effective especially to detect the changes of the templates of targeted web pages.Comment: 5 pages, Proceeding of the 2009 International Conference on Signal Processing Systems pp. 551-55

    GRID Architecture through a Public Cluster

    Full text link
    An architecture to enable some blocks consisting of several nodes in a public cluster connected to different grid collaborations is introduced. It is realized by inserting a web-service in addition to the standard Globus Toolkit. The new web-service performs two main tasks : authenticate the digital certificate contained in an incoming requests and forward it to the designated block. The appropriate block is mapped with the username of the block's owner contained in the digital certificate. It is argued that this algorithm opens an opportunity for any blocks in a public cluster to join various global grids.Comment: 5 pages, Proceeding of the 2008 International Conference on Computer and Communication Engineerin

    Field theory approach in the dynamics of biomatter

    Full text link
    A new approach to model the biomatter dynamics based on the field theory is presented. It is shown that some well known tools in field theory can be utilized to describe the physical phenomena in life matters, in particular at elementary biomatters like DNA and proteins. In this approach, the biomatter dynamics are represented as results of interactions among its elementary matters in the form of lagrangian. Starting from the lagrangian would provide stronger underlying theoretical consideration for further extension. Moreover, it also enables us to acquire rich physical observables using statistical mechanics instead of relying on the space-time dynamics from certain equation of motions which is not solvable due to its nonlinearities. Few examples from previous results are given and explained briefly.Comment: 7 page
    corecore