3,861 research outputs found

    Term-Specific Eigenvector-Centrality in Multi-Relation Networks

    Get PDF
    Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim

    Reasoning & Querying – State of the Art

    Get PDF
    Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

    A spiral model for adding automatic, adaptive authoring to adaptive hypermedia

    Get PDF
    At present a large amount of research exists into the design and implementation of adaptive systems. However, not many target the complex task of authoring in such systems, or their evaluation. In order to tackle these problems, we have looked into the causes of the complexity. Manual annotation has proven to be a bottleneck for authoring of adaptive hypermedia. One such solution is the reuse of automatically generated metadata. In our previous work we have proposed the integration of the generic Adaptive Hypermedia authoring environment, MOT ( My Online Teacher), and a semantic desktop environment, indexed by Beagle++. A prototype, Sesame2MOT Enricher v1, was built based upon this integration approach and evaluated. After the initial evaluations, a web-based prototype was built (web-based Sesame2MOT Enricher v2 application) and integrated in MOT v2, conforming with the findings of the first set of evaluations. This new prototype underwent another evaluation. This paper thus does a synthesis of the approach in general, the initial prototype, with its first evaluations, the improved prototype and the first results from the most recent evaluation round, following the next implementation cycle of the spiral model [Boehm, 88]

    No-But-Semantic-Match: Computing Semantically Matched XML Keyword Search Results

    Get PDF
    Users are rarely familiar with the content of a data source they are querying, and therefore cannot avoid using keywords that do not exist in the data source. Traditional systems may respond with an empty result, causing dissatisfaction, while the data source in effect holds semantically related content. In this paper we study this no-but-semantic-match problem on XML keyword search and propose a solution which enables us to present the top-k semantically related results to the user. Our solution involves two steps: (a) extracting semantically related candidate queries from the original query and (b) processing candidate queries and retrieving the top-k semantically related results. Candidate queries are generated by replacement of non-mapped keywords with candidate keywords obtained from an ontological knowledge base. Candidate results are scored using their cohesiveness and their similarity to the original query. Since the number of queries to process can be large, with each result having to be analyzed, we propose pruning techniques to retrieve the top-kk results efficiently. We develop two query processing algorithms based on our pruning techniques. Further, we exploit a property of the candidate queries to propose a technique for processing multiple queries in batch, which improves the performance substantially. Extensive experiments on two real datasets verify the effectiveness and efficiency of the proposed approaches.Comment: 24 pages, 21 figures, 6 tables, submitted to The VLDB Journal for possible publicatio

    Data-driven Job Search Engine Using Skills and Company Attribute Filters

    Full text link
    According to a report online, more than 200 million unique users search for jobs online every month. This incredibly large and fast growing demand has enticed software giants such as Google and Facebook to enter this space, which was previously dominated by companies such as LinkedIn, Indeed and CareerBuilder. Recently, Google released their "AI-powered Jobs Search Engine", "Google For Jobs" while Facebook released "Facebook Jobs" within their platform. These current job search engines and platforms allow users to search for jobs based on general narrow filters such as job title, date posted, experience level, company and salary. However, they have severely limited filters relating to skill sets such as C++, Python, and Java and company related attributes such as employee size, revenue, technographics and micro-industries. These specialized filters can help applicants and companies connect at a very personalized, relevant and deeper level. In this paper we present a framework that provides an end-to-end "Data-driven Jobs Search Engine". In addition, users can also receive potential contacts of recruiters and senior positions for connection and networking opportunities. The high level implementation of the framework is described as follows: 1) Collect job postings data in the United States, 2) Extract meaningful tokens from the postings data using ETL pipelines, 3) Normalize the data set to link company names to their specific company websites, 4) Extract and ranking the skill sets, 5) Link the company names and websites to their respective company level attributes with the EVERSTRING Company API, 6) Run user-specific search queries on the database to identify relevant job postings and 7) Rank the job search results. This framework offers a highly customizable and highly targeted search experience for end users.Comment: 8 pages, 10 figures, ICDM 201
    • …
    corecore