Search CORE

202,886 research outputs found

DeepPeep: A Form Search Engine

Author: Barbosa Luciano
Nguyen Hoa Thanh
Publication venue: University of Utah
Publication date
Field of study

posterWe present DeepPeep (http://www.deeppeep.org), a new search engine specialized in Web forms. DeepPeep uses a scalable infrastructure for discovering, organizing and analyzing Web forms which serve as entry points to hidden-Web sites. DeepPeep provides an intuitive interface that allows users to explore and visualize large form collections. We presented the overall architecture of DeepPeep which can support both general and specific deep Web search; benefits not only casual users but also application builders. The system provides a scalable and automatic solution to deep Web search and can adapt to the dynamic evolution of deep Web which is growing fast and will play an important role in the future of search

The University of Utah: J. Willard Marriott Digital Library

BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature

Author: C Jonquet
Diana de la Iglesia
F Belleau
G De la Calle
Guillermo de la Calle
ID Dinov
J Saltz
K Wolstencroft
M García-Remesal
M Gerstein
M Krallinger
M Musen
MD Wilkinson
MF Porter
Miguel García-Remesal
N Cannata
P Lord
PA Babu
RD Stevens
sD Tufi
Stefano Chiesa
Victor Maojo
WA Woods
Publication venue: BioMed Central
Publication date: 01/10/2009
Field of study

Abstract Background The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities. Results We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted. Conclusion These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at <url>http://edelman.dia.fi.upm.es/biri/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivo Digital UPM

Intelligent Support for Information Retrieval of Web Documents

Author: Kovaľ Robert
Návrat Pavol
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 21/02/2012
Field of study

The main goal of this research was to investigate the means of intelligent support for retrieval of web documents. We have proposed the architecture of the web tool system --- Trillian, which discovers the interests of users without their interaction and uses them for autonomous searching of related web content. Discovered pages are suggested to the user. The discovery of user interests is based on analysis of documents visited by the users previously. We have created a module for completely transparent tracking of the user's movement on the web, which logs both visited URLs and contents of web pages. The post analysis step is based on a variant of the suffix tree clustering algorithm. We primarily focus on overall Trillian architecture design and the process of discovering topics of interests. We have implemented an experimental prototype of Trillian and evaluated the quality, speed and usefulness of the proposed system. We have shown that clustering is a feasible technique for extraction of interests from web documents. We consider the proposed architecture to be quite promising and suitable for future extensions

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Media Convergence of Newspapers: A Content Analysis of the Houston Chronicle\u27s Print- and Web-based Content

Author: Sullivan Amanda
Publication venue: Scholars Crossing
Publication date: 01/05/2012
Field of study

The channels of news media have changed. The traditional route of receiving news via a newspaper has evolved into a more digital path, leaving many to question the future of the print publication. This study evaluates the print- and Web-based content of the Houston Chronicle. The researcher adds to the field of research on news media by analyzing the online and print content of the publication, creating a new way to categorize and evaluate the subject matter by placing it into four categories: repetition, adaptation, representation, and unique. The researcher seeks to answer three research questions, discovering how each medium exemplifies elements of media convergence

Liberty University Digital Commons

Separable Hyperstructure and Delayed Link Binding

Author: Brailsford David F.
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/12/1999
Field of study

As the amount of material on the World Wide Web continues to grow, users are discovering that the Web's embedded, hard-coded, links are difficult to maintain and update. Hyperlinks need a degree of abstraction in the way they are specified together with a sound underlying document structure and the property of separability from the documents they are linking. The case is made by studying the advantages of program/data separation in computer system architectures and also by re-examining some selected hypermedia systems that have already implemented separability. The prospects for introducing more abstract links into future versions of HTML and PDF, via emerging standards such as XPath, XPointer XLink and URN, are briefly discussed

Nottingham eTheses

Towards hierarchical affiliation resolution: framework, baselines, dataset

Author: Backes Tobias
Dietze Stefan
Hienert Daniel
Publication venue: DEU
Publication date: 01/01/2022
Field of study

Author affiliations provide key information when attributing academic performance like publication counts. So far, such measures have been aggregated either manually or only to top-level institutions, such as universities. Supervised affiliation resolution requires a large number of annotated alignments between affiliation strings and known institutions, which are not readily available. We introduce the task of unsupervised hierarchical affiliation resolution, which assigns affiliations to institutions on all hierarchy levels (e.g. departments), discovering the institutions as well as their hierarchical ordering on the fly. From the corresponding requirements, we derive a simple conceptual framework based on the subset partial order that can be extended to account for the discrepancies evident in realistic affiliations from the Web of Science. We implement initial baselines and provide datasets and evaluation metrics for experimentation. Results show that mapping affiliations to known institutions and discovering lower-level institutions works well with simple baselines, whereas unsupervised top-level- and hierarchical resolution is more challenging. Our work provides structured guidance for further in-depth studies and improved methodology by identifying and discussing a number of observed difficulties and important challenges that future work needs to address

SSOAR - Social Science Open Access Repository

Datamining for Web-Enabled Electronic Business Applications

Author: Nayak Richi
Publication venue: Idea Group
Publication date: 01/01/2003
Field of study

Web-Enabled Electronic Business is generating massive amount of data on customer purchases, browsing patterns, usage times and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for web-enabled electronic-business

Queensland University of Technology ePrints Archive

Constraint-Based Personalization For Business Applications

Author: Toth Kal
Publication venue: 'Clute Institute'
Publication date: 01/05/2002
Field of study

This paper reports on extensions of previous work applying personalization techniques and constraint-based methods within an intelligent agent framework. The Wise Net Inc. has developed an intelligent agent framework specifically for providing advanced scalable collaborative capabilities for easy integration with existing web-enabled enterprise applications. Since the summer of 2001, the author, his colleagues, and his research assistants, have been conducting applied research aimed at discovering the desired personalization models and effects to support collaborative e-business systems. Intelligent agents are being developed to implement these personalization effects through constraint-satisfaction methods and solvers. This paper documents the approach, progress achieved to date, and future directions. This work is being supported by The Wise Net Inc., the BC Advanced Systems Institute (BC ASI), and the Canadian National Research Council (NRC) through the Industrial Research Assistance Program (IRAP)

Clute Institute: Journals

Discovering Exclusive Patterns in Frequent Sequences

Author: Chen Weiru
Keech Malcolm
Lu Jing
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2010
Field of study

This paper presents a new concept for pattern discovery in frequent sequences with potentially interesting applications. Based on data mining, the approach aims to discover exclusive sequential patterns (ESP) by checking the relative exclusion of patterns across data sequences. ESP mining pursues the post-processing of sequential patterns and augments existing work on structural relations patterns mining. A three phase ESP mining method is proposed together with component algorithms, where a running worked example explains the process. Experiments are performed on real-world and synthetic datasets which showcase the results of ESP mining and demonstrate its effectiveness, illuminating the theories developed. An outline case study in workflow modelling gives some insight into future applicability

University of Bedfordshire Repository