202,886 research outputs found

    DeepPeep: A Form Search Engine

    Get PDF
    posterWe present DeepPeep (http://www.deeppeep.org), a new search engine specialized in Web forms. DeepPeep uses a scalable infrastructure for discovering, organizing and analyzing Web forms which serve as entry points to hidden-Web sites. DeepPeep provides an intuitive interface that allows users to explore and visualize large form collections. We presented the overall architecture of DeepPeep which can support both general and specific deep Web search; benefits not only casual users but also application builders. The system provides a scalable and automatic solution to deep Web search and can adapt to the dynamic evolution of deep Web which is growing fast and will play an important role in the future of search

    BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities.</p> <p>Results</p> <p>We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted.</p> <p>Conclusion</p> <p>These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at <url>http://edelman.dia.fi.upm.es/biri/</url>.</p

    Intelligent Support for Information Retrieval of Web Documents

    Get PDF
    The main goal of this research was to investigate the means of intelligent support for retrieval of web documents. We have proposed the architecture of the web tool system --- Trillian, which discovers the interests of users without their interaction and uses them for autonomous searching of related web content. Discovered pages are suggested to the user. The discovery of user interests is based on analysis of documents visited by the users previously. We have created a module for completely transparent tracking of the user's movement on the web, which logs both visited URLs and contents of web pages. The post analysis step is based on a variant of the suffix tree clustering algorithm. We primarily focus on overall Trillian architecture design and the process of discovering topics of interests. We have implemented an experimental prototype of Trillian and evaluated the quality, speed and usefulness of the proposed system. We have shown that clustering is a feasible technique for extraction of interests from web documents. We consider the proposed architecture to be quite promising and suitable for future extensions

    Media Convergence of Newspapers: A Content Analysis of the Houston Chronicle\u27s Print- and Web-based Content

    Get PDF
    The channels of news media have changed. The traditional route of receiving news via a newspaper has evolved into a more digital path, leaving many to question the future of the print publication. This study evaluates the print- and Web-based content of the Houston Chronicle. The researcher adds to the field of research on news media by analyzing the online and print content of the publication, creating a new way to categorize and evaluate the subject matter by placing it into four categories: repetition, adaptation, representation, and unique. The researcher seeks to answer three research questions, discovering how each medium exemplifies elements of media convergence

    Separable Hyperstructure and Delayed Link Binding

    Get PDF
    As the amount of material on the World Wide Web continues to grow, users are discovering that the Web's embedded, hard-coded, links are difficult to maintain and update. Hyperlinks need a degree of abstraction in the way they are specified together with a sound underlying document structure and the property of separability from the documents they are linking. The case is made by studying the advantages of program/data separation in computer system architectures and also by re-examining some selected hypermedia systems that have already implemented separability. The prospects for introducing more abstract links into future versions of HTML and PDF, via emerging standards such as XPath, XPointer XLink and URN, are briefly discussed

    Towards hierarchical affiliation resolution: framework, baselines, dataset

    Get PDF
    Author affiliations provide key information when attributing academic performance like publication counts. So far, such measures have been aggregated either manually or only to top-level institutions, such as universities. Supervised affiliation resolution requires a large number of annotated alignments between affiliation strings and known institutions, which are not readily available. We introduce the task of unsupervised hierarchical affiliation resolution, which assigns affiliations to institutions on all hierarchy levels (e.g. departments), discovering the institutions as well as their hierarchical ordering on the fly. From the corresponding requirements, we derive a simple conceptual framework based on the subset partial order that can be extended to account for the discrepancies evident in realistic affiliations from the Web of Science. We implement initial baselines and provide datasets and evaluation metrics for experimentation. Results show that mapping affiliations to known institutions and discovering lower-level institutions works well with simple baselines, whereas unsupervised top-level- and hierarchical resolution is more challenging. Our work provides structured guidance for further in-depth studies and improved methodology by identifying and discussing a number of observed difficulties and important challenges that future work needs to address

    Datamining for Web-Enabled Electronic Business Applications

    Get PDF
    Web-Enabled Electronic Business is generating massive amount of data on customer purchases, browsing patterns, usage times and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for web-enabled electronic-business

    Constraint-Based Personalization For Business Applications

    Get PDF
    This paper reports on extensions of previous work applying personalization techniques and constraint-based methods within an intelligent agent framework.  The Wise Net Inc. has developed an intelligent agent framework specifically for providing advanced scalable collaborative capabilities for easy integration with existing web-enabled enterprise applications.  Since the summer of 2001, the author, his colleagues, and his research assistants, have been conducting applied research aimed at discovering the desired personalization models and effects to support collaborative e-business systems.  Intelligent agents are being developed to implement these personalization effects through constraint-satisfaction methods and solvers.  This paper documents the approach, progress achieved to date, and future directions.  This work is being supported by The Wise Net Inc., the BC Advanced Systems Institute (BC ASI), and the Canadian National Research Council (NRC) through the Industrial Research Assistance Program (IRAP)

    Discovering Exclusive Patterns in Frequent Sequences

    Get PDF
    This paper presents a new concept for pattern discovery in frequent sequences with potentially interesting applications. Based on data mining, the approach aims to discover exclusive sequential patterns (ESP) by checking the relative exclusion of patterns across data sequences. ESP mining pursues the post-processing of sequential patterns and augments existing work on structural relations patterns mining. A three phase ESP mining method is proposed together with component algorithms, where a running worked example explains the process. Experiments are performed on real-world and synthetic datasets which showcase the results of ESP mining and demonstrate its effectiveness, illuminating the theories developed. An outline case study in workflow modelling gives some insight into future applicability
    • 

    corecore