Search CORE

137,140 research outputs found

A Taxonomy of Hyperlink Hiding Techniques

Author: Geng Guang-Gang
Meng Chi-Jie
Wang Wei
Yang Xiu-Tao
Publication venue
Publication date: 01/01/2014
Field of study

Hidden links are designed solely for search engines rather than visitors. To get high search engine rankings, link hiding techniques are usually used for the profitability of black industries, such as illicit game servers, false medical services, illegal gambling, and less attractive high-profit industry, etc. This paper investigates hyperlink hiding techniques on the Web, and gives a detailed taxonomy. We believe the taxonomy can help develop appropriate countermeasures. Study on 5,583,451 Chinese sites' home pages indicate that link hidden techniques are very prevalent on the Web. We also tried to explore the attitude of Google towards link hiding spam by analyzing the PageRank values of relative links. The results show that more should be done to punish the hidden link spam.Comment: 12 pages, 2 figure

arXiv.org e-Print Archive

Enhancing the online discovery of geospatial data through taxonomy, folksonomy and semantic annotations

Author: Coetzee Sj
Katumba S
Publication venue: 'African Journals Online (AJOL)'
Publication date: 01/08/2015
Field of study

Spatial data infrastructures (SDIs) are meant to facilitate dissemination and consumption of spatial data, amongst others, through publication and discovery of spatial metadata in geoportals. However, geoportals are often known to geoinformation communities only and present technological limitations which make it difficult for general purpose web search engines to discover and index the data catalogued in (or registered with) a geoportal. The mismatch between standard spatial metadata content and the search terms that Web users employ when looking for spatial data, presents a further barrier to spatial data discovery. The need arises for creating and sharing spatial metadata that is discoverable by general purpose web search engines and users alike. Using folksonomies and semantic annotations appears as an option to eliminate the mismatch and to publish the metadata for discovery on the Web. Based on an analysis of search query terms employed when searching for spatial data on the Web, a taxonomy of search terms is constructed. The taxonomy constitutes the basis towards understanding how web resources in general, and HTML pages with standard spatial metadata in particular, can be documented so that they are discoverable by general purpose web search engines. We illustrate the use of the constructed taxonomy in semantic annotation of web resources, such as HTML pages with spatial metadata on the Web

UPSpace at the University of Pretoria

A User-Centered Concept Mining System for Query and Document Understanding at Tencent

Author: Guo Weidong
Lai Kunfeng
Lin Jinghong
Liu Bang
Niu Di
Wang Chaoyue
Xu Shunnan
Xu Yu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/05/2019
Field of study

Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201

arXiv.org e-Print Archive

Social Network Theory, Broadband and the World Wide Web

Author: Sgroi Daniel
Publication venue: Faculty of Economics
Publication date: 14/03/2006
Field of study

This paper aims to predict some possible futures for the World Wide Web based on several key network parameters: size, complexity, cost and increasing connection speed thorough the uptake of broadband technology. This is done through the production of a taxonomy specifically evaluating the stability properties of the fully-connected star and complete networks, based on the Jackson and Wolinsky (1996) connections model modified to incorporate complexity concerns. We find that when connection speeds are low neither the star nor complete networks are stable, and when connection speeds are high the star network is usually stable, while the complete network is never stable. For intermediate speed levels much depends upon the other parameters. Under plausible assumptions about the future, we find that the Web may be increasingly dominated by a single intermediate site, perhaps best described as a search engine

Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies

Author: Andersen Reid
Buscher Georg
Counts Scott
Das Sarkar Snigdha Sarathi
Manivannan Sathish
Montazer Ali
Neville Jennifer
Ni Xiaochuan
Rangan Nagu
Safavi Tara
Shah Chirag
Suri Siddharth
Wan Mengting
Wang Leijie
White Ryen W.
Yang Longqi
Publication venue
Publication date: 14/09/2023
Field of study

Log data can reveal valuable information about how users interact with web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for new forms of web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics. Existing methods rely on manual or ML-based labeling, which are either expensive or inflexible for large and changing datasets. We propose a novel solution using large language models (LLMs), which can generate rich and relevant concepts, descriptions, and examples for user intents. However, using LLMs to generate a user intent taxonomy and apply it to do log analysis can be problematic for two main reasons: such a taxonomy is not externally validated, and there may be an undesirable feedback loop. To overcome these issues, we propose a new methodology with human experts and assessors to verify the quality of the LLM-generated taxonomy. We also present an end-to-end pipeline that uses an LLM with human-in-the-loop to produce, refine, and use labels for user intent analysis in log data. Our method offers a scalable and adaptable way to analyze user intents in web-scale log data with minimal human effort. We demonstrate its effectiveness by uncovering new insights into user intents from search and chat logs from Bing

arXiv.org e-Print Archive

Indexing and retrieval in digital libraries : developing taxonomies for a repository of decision technologies

Author: Rogers Patricia.
Publication venue: Monterey, California. Naval Postgraduate School
Publication date: 01/03/1996
Field of study

DecisionNet is an online Internet-based repository of decision technologies. It links remote users with these technologies and provides a directory service to enable search and selection of suitable technologies. The ability to retrieve relevant objects through search mechanisms is basic to any repository's success and usability and depends on effective classification of the decision technologies. This thesis develops classification methods to enable indexing of the DecisionNet repository. Existing taxonomies for software and other online repositories are examined. Criteria and principles for a good taxonomy are established and systematically applied to develop DecisionNet taxonomies. A database design is developed to store the taxonomies and to classify the technologies in the repository. User interface issues for navigation of a hierarchical classification system are discussed. A user interface for remote World Wide Web users is developed. This user interface is designed for browsing the taxonomy structure and creating search parameters online. Recommendations for the implementation of a repository search mechanism are given.http://archive.org/details/indexingndretrie1094532199NAU.S. Navy (U.S.N.) authorApproved for public release; distribution is unlimited

A fast Peptide Match service for UniProt Knowledgebase

Author: Chen Chuming
Huang Hongzhan
Li Zhiwen
Suzek Baris E.
Wu Cathy H.
Publication venue
Publication date: 02/08/2017
Field of study

Summary: We have developed a new web application for peptide matching using Apache Lucene-based search engine. The Peptide Match service is designed to quickly retrieve all occurrences of a given query peptide from UniProt Knowledgebase (UniProtKB) with isoforms. The matched proteins are shown in summary tables with rich annotations, including matched sequence region(s) and links to corresponding proteins in a number of proteomic/peptide spectral databases. The results are grouped by taxonomy and can be browsed by organism, taxonomic group or taxonomy tree. The service supports queries where isobaric leucine and isoleucine are treated equivalent, and an option for searching UniRef100 representative sequences, as well as dynamic queries to major proteomic databases. In addition to the web interface, we also provide RESTful web services. The underlying data are updated every 4 weeks in accordance with the UniProt releases. Availability: http://proteininformationresource.org/peptide.shtml Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin