137,140 research outputs found
A Taxonomy of Hyperlink Hiding Techniques
Hidden links are designed solely for search engines rather than visitors. To
get high search engine rankings, link hiding techniques are usually used for
the profitability of black industries, such as illicit game servers, false
medical services, illegal gambling, and less attractive high-profit industry,
etc. This paper investigates hyperlink hiding techniques on the Web, and gives
a detailed taxonomy. We believe the taxonomy can help develop appropriate
countermeasures. Study on 5,583,451 Chinese sites' home pages indicate that
link hidden techniques are very prevalent on the Web. We also tried to explore
the attitude of Google towards link hiding spam by analyzing the PageRank
values of relative links. The results show that more should be done to punish
the hidden link spam.Comment: 12 pages, 2 figure
Enhancing the online discovery of geospatial data through taxonomy, folksonomy and semantic annotations
Spatial data infrastructures (SDIs) are meant to facilitate dissemination and consumption of spatial data, amongst others, through publication and discovery of spatial metadata in geoportals. However, geoportals are often known to geoinformation communities only and present technological limitations which make it difficult for general purpose web search engines to discover and index the data catalogued in (or registered with) a geoportal. The mismatch between standard spatial metadata content and the search terms that Web users employ when looking for spatial data, presents a further barrier to spatial data discovery. The need arises for creating and sharing spatial metadata that is discoverable by general purpose web search engines and users alike. Using folksonomies and semantic annotations appears as an option to eliminate the mismatch and to publish the metadata for discovery on the Web. Based on an analysis of search query terms employed when searching for spatial data on the Web, a taxonomy of search terms is constructed. The taxonomy constitutes the basis towards understanding how web resources in general, and HTML pages with standard spatial metadata in particular, can be documented so that they are discoverable by general purpose web search engines. We illustrate the use of the constructed taxonomy in semantic annotation of web resources, such as HTML pages with spatial metadata on the Web
A User-Centered Concept Mining System for Query and Document Understanding at Tencent
Concepts embody the knowledge of the world and facilitate the cognitive
processes of human beings. Mining concepts from web documents and constructing
the corresponding taxonomy are core research problems in text understanding and
support many downstream tasks such as query analysis, knowledge base
construction, recommendation, and search. However, we argue that most prior
studies extract formal and overly general concepts from Wikipedia or static web
pages, which are not representing the user perspective. In this paper, we
describe our experience of implementing and deploying ConcepT in Tencent QQ
Browser. It discovers user-centered concepts at the right granularity
conforming to user interests, by mining a large amount of user queries and
interactive search click logs. The extracted concepts have the proper
granularity, are consistent with user language styles and are dynamically
updated. We further present our techniques to tag documents with user-centered
concepts and to construct a topic-concept-instance taxonomy, which has helped
to improve search as well as news feeds recommendation in Tencent QQ Browser.
We performed extensive offline evaluation to demonstrate that our approach
could extract concepts of higher quality compared to several other existing
methods. Our system has been deployed in Tencent QQ Browser. Results from
online A/B testing involving a large number of real users suggest that the
Impression Efficiency of feeds users increased by 6.01% after incorporating the
user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201
Social Network Theory, Broadband and the World Wide Web
This paper aims to predict some possible futures for the World Wide Web based on several key network parameters: size, complexity, cost and increasing connection speed thorough the uptake of broadband technology. This is done through the production of a taxonomy specifically evaluating the stability properties of the fully-connected star and complete networks, based on the Jackson and Wolinsky (1996) connections model modified to incorporate complexity concerns. We find that when connection speeds are low neither the star nor complete networks are stable, and when connection speeds are high the star network is usually stable, while the complete network is never stable. For intermediate speed levels much depends upon the other parameters. Under plausible assumptions about the future, we find that the Web may be increasingly dominated by a single intermediate site, perhaps best described as a search engine
Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies
Log data can reveal valuable information about how users interact with web
search services, what they want, and how satisfied they are. However, analyzing
user intents in log data is not easy, especially for new forms of web search
such as AI-driven chat. To understand user intents from log data, we need a way
to label them with meaningful categories that capture their diversity and
dynamics. Existing methods rely on manual or ML-based labeling, which are
either expensive or inflexible for large and changing datasets. We propose a
novel solution using large language models (LLMs), which can generate rich and
relevant concepts, descriptions, and examples for user intents. However, using
LLMs to generate a user intent taxonomy and apply it to do log analysis can be
problematic for two main reasons: such a taxonomy is not externally validated,
and there may be an undesirable feedback loop. To overcome these issues, we
propose a new methodology with human experts and assessors to verify the
quality of the LLM-generated taxonomy. We also present an end-to-end pipeline
that uses an LLM with human-in-the-loop to produce, refine, and use labels for
user intent analysis in log data. Our method offers a scalable and adaptable
way to analyze user intents in web-scale log data with minimal human effort. We
demonstrate its effectiveness by uncovering new insights into user intents from
search and chat logs from Bing
Indexing and retrieval in digital libraries : developing taxonomies for a repository of decision technologies
DecisionNet is an online Internet-based repository of decision technologies. It links remote users with these technologies and provides a directory service to enable search and selection of suitable technologies. The ability to retrieve relevant objects through search mechanisms is basic to any repository's success and usability and depends on effective classification of the decision technologies. This thesis develops classification methods to enable indexing of the DecisionNet repository. Existing taxonomies for software and other online repositories are examined. Criteria and principles for a good taxonomy are established and systematically applied to develop DecisionNet taxonomies. A database design is developed to store the taxonomies and to classify the technologies in the repository. User interface issues for navigation of a hierarchical classification system are discussed. A user interface for remote World Wide Web users is developed. This user interface is designed for browsing the taxonomy structure and creating search parameters online. Recommendations for the implementation of a repository search mechanism are given.http://archive.org/details/indexingndretrie1094532199NAU.S. Navy (U.S.N.) authorApproved for public release; distribution is unlimited
A fast Peptide Match service for UniProt Knowledgebase
Summary: We have developed a new web application for peptide matching using Apache Lucene-based search engine. The Peptide Match service is designed to quickly retrieve all occurrences of a given query peptide from UniProt Knowledgebase (UniProtKB) with isoforms. The matched proteins are shown in summary tables with rich annotations, including matched sequence region(s) and links to corresponding proteins in a number of proteomic/peptide spectral databases. The results are grouped by taxonomy and can be browsed by organism, taxonomic group or taxonomy tree. The service supports queries where isobaric leucine and isoleucine are treated equivalent, and an option for searching UniRef100 representative sequences, as well as dynamic queries to major proteomic databases. In addition to the web interface, we also provide RESTful web services. The underlying data are updated every 4 weeks in accordance with the UniProt releases. Availability: http://proteininformationresource.org/peptide.shtml Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
- …