    Focused kraulinh capacity resources as a means of reducing search in WEB

    У статті досліджується проблема створення системи моніторингу тематичних Web-ресурсів для корпоративного середовища. Запропоновано класифікацію основних алгоритмів обходу ресурсів. Розроблено класифікацію метрик ранжування сайтів за ознакою об’єктів, на основі яких виконується оцінювання. Проведено попередній розрахунок оцінки придатності сфокусованого пошуку для даної задачі.В статье исследуется проблема создания системы мониторинга тематических Web-ресурсов для корпоративной среды. Предложена классификация основных алгоритмов обхода ресурсов. Разработана классификация метрик ранжирования сайтов по признаку объектов, на основе которых выполняется оценка. Проведен предварительный расчет оценки пригодности сфокусированного поиска для данной задачи.This paper examines the problem of creating a system for monitoring Web-themed resources for the corporate environment. The classification of the basic algorithms traversing resources. The classification of metrics ranking sites on the basis of the objects on which the evaluation is performed. A preliminary calculation of the suitability assessment focused search for this problem

    Fine Grained Approach for Domain Specific Seed URL Extraction

    Domain Specific Search Engines are expected to provide relevant search results. Availability of enormous number of URLs across subdomains improves relevance of domain specific search engines. The current methods for seed URLs can be systematic ensuring representation of subdomains. We propose a fine grained approach for automatic extraction of seed URLs at subdomain level using Wikipedia and Twitter as repositories. A SeedRel metric and a Diversity Index for seed URL relevance are proposed to measure subdomain coverage. We implemented our approach for \u27Security - Information and Cyber\u27 domain and identified 34,007 Seed URLs and 400,726 URLs across subdomains. The measured Diversity index value of 2.10 conforms that all subdomains are represented, hence, a relevant \u27Security Search Engine\u27 can be built. Our approach also extracted more URLs (seed and child) as compared to existing approaches for URL extraction

    Schema matching in a peer-to-peer database system

    Includes bibliographical references (p. 112-118).Peer-to-peer or P2P systems are applications that allow a network of peers to share resources in a scalable and efficient manner. My research is concerned with the use of P2P systems for sharing databases. To allow data mediation between peers' databases, schema mappings need to exist, which are mappings between semantically equivalent attributes in different peers' schemas. Mappings can either be defined manually or found semi-automatically using a technique called schema matching. However, schema matching has not been used much in dynamic environments, such as P2P networks. Therefore, this thesis investigates how to enable effective semi-automated schema matching within a P2P network

    Deliverable D2.3 Specification of Web mining process for hypervideo concept identification

    This deliverable presents a state-of-art and requirements analysis report for the web mining process as part of the WP2 of the LinkedTV project. The deliverable is divided into two subject areas: a) Named Entity Recognition (NER) and b) retrieval of additional content. The introduction gives an outline of the workflow of the work package, with a subsection devoted to relations with other work packages. The state-of-art review is focused on prospective techniques for LinkedTV. In the NER domain, the main focus is on knowledge-based approaches, which facilitate disambiguation of identified entities using linked open data. As part of the NER requirement analysis, the first tools developed are described and evaluated (NERD, SemiTags and THD). The area of linked additional content is broader and requires a more thorough analysis. A balanced overview of techniques for dealing with the various knowledge sources (semantic web resources, web APIs and completely unstructured resources from a white list of web sites) is presented. The requirements analysis comes out of the RBB and Sound and Vision LinkedTV scenarios