171,532 research outputs found
Enhancing Content-And-Structure Information Retrieval using a Native XML Database
Three approaches to content-and-structure XML retrieval are analysed in this
paper: first by using Zettair, a full-text information retrieval system; second
by using eXist, a native XML database, and third by using a hybrid XML
retrieval system that uses eXist to produce the final answers from likely
relevant articles retrieved by Zettair. INEX 2003 content-and-structure topics
can be classified in two categories: the first retrieving full articles as
final answers, and the second retrieving more specific elements within articles
as final answers. We show that for both topic categories our initial hybrid
system improves the retrieval effectiveness of a native XML database. For
ranking the final answer elements, we propose and evaluate a novel retrieval
model that utilises the structural relationships between the answer elements of
a native XML database and retrieves Coherent Retrieval Elements. The final
results of our experiments show that when the XML retrieval task focusses on
highly relevant elements our hybrid XML retrieval system with the Coherent
Retrieval Elements module is 1.8 times more effective than Zettair and 3 times
more effective than eXist, and yields an effective content-and-structure XML
retrieval
Supervised topic models with word order structure for document classification and retrieval learning
One limitation of most existing probabilistic latent topic models for document classification is that the topic model itself does not consider useful side-information, namely, class labels of documents. Topic models, which in turn consider the side-information, popularly known as supervised topic models, do not consider the word order structure in documents. One of the motivations behind considering the word order structure is to capture the semantic fabric of the document. We investigate a low-dimensional latent topic model for document classification. Class label information and word order structure are integrated into a supervised topic model enabling a more effective interaction among such information for solving document classification. We derive a collapsed Gibbs sampler for our model. Likewise, supervised topic models with word order structure have not been explored in document retrieval learning. We propose a novel supervised topic model for document retrieval learning which can be regarded as a pointwise model for tackling the learning-to-rank task. Available relevance assessments and word order structure are integrated into the topic model itself. We conduct extensive experiments on several publicly available benchmark datasets, and show that our model improves upon the state-of-the-art models
Thread Reconstruction in Conversational Data using Neural Coherence Models
Discussion forums are an important source of information. They are often used
to answer specific questions a user might have and to discover more about a
topic of interest. Discussions in these forums may evolve in intricate ways,
making it difficult for users to follow the flow of ideas. We propose a novel
approach for automatically identifying the underlying thread structure of a
forum discussion. Our approach is based on a neural model that computes
coherence scores of possible reconstructions and then selects the highest
scoring, i.e., the most coherent one. Preliminary experiments demonstrate
promising results outperforming a number of strong baseline methods.Comment: Neu-IR: Workshop on Neural Information Retrieval 201
Role of Ranking Algorithms for Information Retrieval
As the use of web is increasing more day by day, the web users get easily
lost in the web's rich hyper structure. The main aim of the owner of the
website is to give the relevant information according their needs to the users.
We explained the Web mining is used to categorize users and pages by analyzing
user's behavior, the content of pages and then describe Web Structure mining.
This paper includes different Page Ranking algorithms and compares those
algorithms used for Information Retrieval. Different Page Rank based algorithms
like Page Rank (PR), WPR (Weighted Page Rank), HITS (Hyperlink Induced Topic
Selection), Distance Rank and EigenRumor algorithms are discussed and compared.
Simulation Interface has been designed for PageRank algorithm and Weighted
PageRank algorithm but PageRank is the only ranking algorithm on which Google
search engine works.Comment: Keywords: Page Rank, Web Mining, Web Structured Mining, Web Content
Minin
A Framework to compare text annotators and its applications
Text in human languages have a low logic structure and are inherently ambiguous. For this reason, the typical approach of Information Retrieval to text documents has been based on the Bag-of-words model, in which documents are analyzed only by the occurrence of terms, discarding any possible structure. But a recently developing line of research is devoted to adding structure to unstructured text, by recognizing the topics contained in a text and annotate them.
Topic annotators are systems that have the purpose of linking a natural language document to the topics that are relevant for describing the content of the document. This systems can be applied to many classic problems of Information Retrieval: the categorization of a document can be based on its topics; the clustering of a set of documents can be done using their topics to find similarities; for a search engine, it would be easier to find relevant pages if there was a way to know the topics that the query expresses and search for them in the cached web pages.
In this thesis, we present a formal framework that describe the problems related to topic retrieval, the algorithms that solve those problems, and the way they can be benchmarked
Mapping the relationship between knowledge management and information architecture
Includes bibliographical references (leaves 106-115).This dissertation defines knowledge in terms or traditional epistemological ideals and as a strategic resource. Knowledge management is defined in terms or the ability or organizations to manage knowledge as a strategic resource in order to gain all advantage from it. In the knowledge management framework, knowledge is presented as a continuum consisting of tacit, implicit and explicit knowledge. Tacit and implicit knowledge is managed through the acknowledgement of the social nature of knowledge. One method to achieve this is communities of practice. On the other end of the spectrum, explicit knowledge is very close in nature and character to information. Due to the expansion of available information resources the design and structure of information (explicit knowledge) for effective retrieval has become very important. Information architecture is a field that specializes in the design and structure of information for effective retrieval. Traditional information architecture tools such as metadata and subject classification address some of the issues, but experience difficulty in heterogeneous environments such as the Internet. Topic maps are considered as a possible solution to the concerns of metadata classification and subject based classification. Due to the extent and nature of the information recorded in a topic map, it becomes an information resource in itself. Topic maps also act as an enabling technology for knowledge management as it maps the complex relationships between concepts and include a range of information resources. The conclusion of this dissertation is the representation of a conceptual model based on the themes developed in this dissertation. The main advantage of the conceptual model is the clear and direct link between knowledge management and information architecture
The Use of Latent Semantic Indexing to Cluster Documents into Their Subject Areas
Keyword matching information retrieval systems areplagued with problems of noise in the document collection, arising from synonymy and polysemy. This noise tends to hide the latent structure of the documents, hence reduing the accuracy of the information retrieval systems, as well asmaking it difficult for clustering algorithms to pick up on shared concepts, and effectively cluster similar documents. Latent Semantic Analysis (LSA) through its use of Singular Value Decomposition reduces the dimension of the document space, mapping it onto a smaller concept space devoid of this noice and making it easier to group similar documents together. This work is an exploratory report of the use of LSA to cluster a small dataset of documents according to their topic areas to see how LSA would fare in comparison to clustering with a clustering package, without LS
- …