760 research outputs found

    WAQS : a web-based approximate query system

    Get PDF
    The Web is often viewed as a gigantic database holding vast stores of information and provides ubiquitous accessibility to end-users. Since its inception, the Internet has experienced explosive growth both in the number of users and the amount of content available on it. However, searching for information on the Web has become increasingly difficult. Although query languages have long been part of database management systems, the standard query language being the Structural Query Language is not suitable for the Web content retrieval. In this dissertation, a new technique for document retrieval on the Web is presented. This technique is designed to allow a detailed retrieval and hence reduce the amount of matches returned by typical search engines. The main objective of this technique is to allow the query to be based on not just keywords but also the location of the keywords within the logical structure of a document. In addition, the technique also provides approximate search capabilities based on the notion of Distance and Variable Length Don\u27t Cares. The proposed techniques have been implemented in a system, called Web-Based Approximate Query System, which contains an SQL-like query language called Web-Based Approximate Query Language. Web-Based Approximate Query Language has also been integrated with EnviroDaemon, an environmental domain specific search engine. It provides EnviroDaemon with more detailed searching capabilities than just keyword-based search. Implementation details, technical results and future work are presented in this dissertation

    Narrative and Hypertext 2011 Proceedings: a workshop at ACM Hypertext 2011, Eindhoven

    No full text

    Links Are Everywhere: Effects of Web-Based Groupings on Trust Transfer

    Get PDF
    One of the most ubiquitous examples of information technology is the World Wide Web. On the Web, hypertext links are everywhere, but trust may be hard to find. This research examines how the presentation of groups of links may affect consumers’ trust in organizations encountered on the Web. We use an experimental methodology to examine how the description of a hypertext list and the familiarity of members of the list may affect trust in both familiar and unknown target organizations. Our theoretical model is rooted in the literatures on trust transfer and entitativity, which is the extent to which individual entities are perceived as forming a group. Results are expected to answer practical questions with regard to the use and presentation of hypertext links and also to extend the trust transfer literature by examining factors not previously considered: super- dyadic transfer and potential negative effects of transfer

    Web Mining for Social Network Analysis:A Review, Direction and Future Vision.

    Get PDF
    Although web is rich in data, gathering this data and making sense of this data is extremely difficult due to its unorganised nature. Therefore existing Data Mining techniques can be applied toextract information from the web data. The knowledge thus extracted can also be used for Analysis of Social Networks and Online Communities. This paper gives a brief insight to Web Mining and Link Analysis used in Social Network Analysis and reveals the algorithms such as HITS, PAGERANK, SALSA, PHITS, CLEVER and INDEGREE which gives a measure to identify Online Communities over Social Networks. The most common amongst these algorithms are PageRank and HITS. PageRank measures the importance of a page efficiently with the help of inlinks in less time, while HITS uses both inlinks and outlinks to measure the importance of a web page and is sensitive to user query. Further various extensions to these algorithms also exist to refine the query based search results. It opens many doors for future researches to find undiscovered knowledge of existing online communities over various social networks.Keywords:Web Structure Mining, Link Analysis, Link Mining, Online Community Minin

    Template Mining for Information Extraction from Digital Documents

    Get PDF
    published or submitted for publicatio

    Automated subject classification of textual web documents

    Full text link

    A multistrategy approach for digital text

    Get PDF
    The goal of the research described here is to develop a multistrategy classifier system that can be used for document categorization. The system automatically discovers classification patterns by applying several empirical learning methods to different representations for preclassified documents. The learners work in a parallel manner, where each learner carries out its own feature selection based on evolutionary techniques and then obtains a classification model. In classifying documents, the system combines the predictions of the learners by applying evolutionary techniques as well. The system relies on a modular, flexible architecture that makes no assumptions about the design of learners or the number of learners available and guarantees the independence of the thematic domain

    Classification of HTML Documents

    Get PDF
    Text Classification is the task of mapping a document into one or more classes based on the presence or absence of words (or features) in the document. It is intensively being studied and different classification techniques and algorithms have been developed. This thesis focuses on classification of online documents that has become more critical with the development of World Wide Web. The WWW vastly increases the availability of on-line documents in digital format and has highlighted the need to classify them. From this background, we have noted the emergence of “automatic Web Classification”. These mainly concentrate on classifying HTML-like documents into classes or categories by not only using the methods that are inherited from the traditional Text Classification process, but also utilizing the extra information provided only by Web pages. Our work is based on the fact that, Web documents, contain not only ordinary features (words) but also extra information, such as meta-data and hyperlinks that can be used to advantage the classification process. The aim of this research is to study various ways of using the extra information, in particularly, hyperlink information provided by HTML-documents (Web pages). The merit of the approach, developed in this thesis, is its simplicity, compared with existing approaches. We present different approaches of using hyperlink information to improve the effectiveness of web classification. Unlike other work in this area, we will only use the mappings between linked documents and their own class or classes. In this case, we only need to add a few features called linked-class features into the datasets, and then apply classifiers on them for classification. In the numerical experiments we adopted two wellknown Text Classification algorithms, Support Vector Machines and BoosTexter. The results obtained show that classification accuracy can be improved by using mixtures of ordinary and linked-class features. Moreover, out-links usually work better than in-links in classification. We also analyse and discuss the reasons behind this improvement.Master of Computin
    • …
    corecore