28 research outputs found

    Focused kraulinh capacity resources as a means of reducing search in WEB

    Get PDF
    У статті досліджується проблема створення системи моніторингу тематичних Web-ресурсів для корпоративного середовища. Запропоновано класифікацію основних алгоритмів обходу ресурсів. Розроблено класифікацію метрик ранжування сайтів за ознакою об’єктів, на основі яких виконується оцінювання. Проведено попередній розрахунок оцінки придатності сфокусованого пошуку для даної задачі.В статье исследуется проблема создания системы мониторинга тематических Web-ресурсов для корпоративной среды. Предложена классификация основных алгоритмов обхода ресурсов. Разработана классификация метрик ранжирования сайтов по признаку объектов, на основе которых выполняется оценка. Проведен предварительный расчет оценки пригодности сфокусированного поиска для данной задачи.This paper examines the problem of creating a system for monitoring Web-themed resources for the corporate environment. The classification of the basic algorithms traversing resources. The classification of metrics ranking sites on the basis of the objects on which the evaluation is performed. A preliminary calculation of the suitability assessment focused search for this problem

    Using Web Sites External Views for Fuzzy Classification

    Get PDF
    In the paper a fuzzy sets implementation into web sites classification is considered. Web sites external features are addressed and the possibility to use them for the classification is proved. An example with five different categories classification is given

    Ontology Based Approach for Services Information Discovery using Hybrid Self Adaptive Semantic Focused Crawler

    Get PDF
    Focused crawling is aimed at specifically searching out pages that are relevant to a predefined set of topics. Since ontology is an all around framed information representation, ontology based focused crawling methodologies have come into exploration. Crawling is one of the essential systems for building information stockpiles. The reason for semantic focused crawler is naturally finding, commenting and ordering the administration data with the Semantic Web advances. Here, a framework of a hybrid self-adaptive semantic focused crawler – HSASF crawler, with the inspiration driving viably discovering, and sorting out administration organization information over the Internet, by considering the three essential issues has been displayed. A semi-supervised system has been planned with the inspiration driving subsequently selecting the ideal limit values for each idea, while considering the optimal performance without considering the constraint of the preparation of data set. DOI: 10.17762/ijritcc2321-8169.15072

    Do peers see more in a paper than its authors?

    Get PDF
    Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances-sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances

    Modelo de análisis de información desestructurada utilizando técnicas de recopilación y minería web

    Get PDF
    Debido al gran avance en las Tecnologías de la Información y las Comunicaciones (TICs) se ha facilitado y simplificado de manera significativa el acceso, el procesamiento y el almacenamiento de los datos las organizaciones en general. Estas tecnologías han cambiado el paradigma del análisis de la información, ya que hoy en día el problema no es la escasez, sino la excesiva cantidad de datos disponibles que, claramente, no todos son de utilidad para el proceso de Toma de Decisiones (TD). Otro inconveniente, además del gran volumen de información que se necesita analizar, es que los datos disponibles están desestructurados y distribuidos, dificultando aún más la tarea de detectar aquellos que pueden ser útiles. Este proyecto tiene como objetivo desarrollar una herramienta para búsqueda de información para el proceso de TD.Eje: Bases de datos y Minería de datos.Red de Universidades con Carreras en Informática (RedUNCI

    Searching for hidden-web databases

    Get PDF
    Journal ArticleRecently, there has been increased interest in the retrieval and integration of hidden-Web data with a view to leverage high-quality information available in online databases. Although previous works have addressed many aspects of the actual integration, including matching form schemata and automatically filling out forms, the problem of locating relevant data sources has been largely overlooked. Given the dynamic nature of the Web, where data sources are constantly changing, it is crucial to automatically discover these resources. However, considering the number of documents on the Web (Google already indexes over 8 billion documents), automatically finding tens, hundreds or even thousands of forms that are relevant to the integration task is really like looking for a few needles in a haystack. Besides, since the vocabulary and structure of forms for a given domain are unknown until the forms are actually found, it is hard to define exactly what to look for. We propose a new crawling strategy to automatically locate hidden-Web databases which aims to achieve a balance between the two conflicting requirements of this problem: the need to perform a broad search while at the same time avoiding the need to crawl a large number of irrelevant pages. The proposed strategy does that by focusing the crawl on a given topic; by judiciously choosing links to follow within a topic that are more likely to lead to pages that contain forms; and by employing appropriate stopping criteria. We describe the algorithms underlying this strategy and an experimental evaluation which shows that our approach is both effective and efficient, leading to larger numbers of forms retrieved as a function of the number of pages visited than other crawlers
    corecore