109 research outputs found

    Fuzzy equivalence relation based clustering and its use to restructuring websites' hyperlinks and web pages

    Get PDF
    Quality design of websites implies that among other factors, hypelinks’ structure should allow the users to reach the information they seek with the minimum number of clicks. This paper utilises the fuzzy equivalence relation based clustering in adapting website hyperlinks’ structure so that the redesigned website allows users to meet as effectively as possible their informational and navigational requirements. The fuzzy tolerance relation is calculated based on the usage rate of hyperlinks in a website. The equivalence relation identifies clusters of hyperlinks. The clusters are then used to realocate hyperlinks in webpages and to rearrange webpages into the website structure hierarchy

    WAQS : a web-based approximate query system

    Get PDF
    The Web is often viewed as a gigantic database holding vast stores of information and provides ubiquitous accessibility to end-users. Since its inception, the Internet has experienced explosive growth both in the number of users and the amount of content available on it. However, searching for information on the Web has become increasingly difficult. Although query languages have long been part of database management systems, the standard query language being the Structural Query Language is not suitable for the Web content retrieval. In this dissertation, a new technique for document retrieval on the Web is presented. This technique is designed to allow a detailed retrieval and hence reduce the amount of matches returned by typical search engines. The main objective of this technique is to allow the query to be based on not just keywords but also the location of the keywords within the logical structure of a document. In addition, the technique also provides approximate search capabilities based on the notion of Distance and Variable Length Don\u27t Cares. The proposed techniques have been implemented in a system, called Web-Based Approximate Query System, which contains an SQL-like query language called Web-Based Approximate Query Language. Web-Based Approximate Query Language has also been integrated with EnviroDaemon, an environmental domain specific search engine. It provides EnviroDaemon with more detailed searching capabilities than just keyword-based search. Implementation details, technical results and future work are presented in this dissertation

    WEB recommendations for E-commerce websites

    Get PDF
    In this part of the thesis we have investigated how the navigation utilizing web recommendations can be implemented on the e-commerce websites based on integrated data sources. The integrated e-commerce websites are an interesting use case for web recommendations. One of the reasons for this interest is that many modern, large and economically successful e-commerce websites follow the integrated approach. Another reason is that especially in the integrated environment, due to the lack of the pre-defined semantic connections between the data, the web recommendations step forward as means of enabling user navigation. In this chapter we have presented the architecture for the websites based on integrated data sources named EC-Fuice. We have also presented the prototypical implementation of our architecture which serves as a proof-of-concept and investigated the challenges of creating navigation on an integrated website. The following issues were addressed in this part of the thesis: Combination of several state-of-the-art tools and techniques in the fields of databases, data integration, ontology matching and web engineering into one generic architecture for creating integrated websites. Comparative experiments with several techniques for instance matching (also known as record linkage or duplicate detection). Investigation on using the ontology matching to facilitate the instance matching. Comparative experiments with several techniques for ontology matching. Investigations on the instance-based ontology matching and the possibilities for combining instance-based ontology matching with other techniques for ontology matching. Investigation of the possibilities to improve user navigation in the integrated data environment with different types of web recommendations. Review of the related work in the fields of data integration and ontology matching and discussion of the contact points between the research described here and other related projects. The main contributions of the research described in this part of the thesis are the EC-Fuice architecture, the novel method for matching e-commerce ontologies based on combination of instance information and metadata information, the experimental results of ontology and instance matching performed by different matching algorithms and the classification of the types of recommendations which can be used on an integrated e-commerce website

    Proceedings of the 5th International Workshop "What can FCA do for Artificial Intelligence?", FCA4AI 2016(co-located with ECAI 2016, The Hague, Netherlands, August 30th 2016)

    Get PDF
    International audienceThese are the proceedings of the fifth edition of the FCA4AI workshop (http://www.fca4ai.hse.ru/). Formal Concept Analysis (FCA) is a mathematically well-founded theory aimed at data analysis and classification that can be used for many purposes, especially for Artificial Intelligence (AI) needs. The objective of the FCA4AI workshop is to investigate two main main issues: how can FCA support various AI activities (knowledge discovery, knowledge representation and reasoning, learning, data mining, NLP, information retrieval), and how can FCA be extended in order to help AI researchers to solve new and complex problems in their domain. Accordingly, topics of interest are related to the following: (i) Extensions of FCA for AI: pattern structures, projections, abstractions. (ii) Knowledge discovery based on FCA: classification, data mining, pattern mining, functional dependencies, biclustering, stability, visualization. (iii) Knowledge processing based on concept lattices: modeling, representation, reasoning. (iv) Application domains: natural language processing, information retrieval, recommendation, mining of web of data and of social networks, etc

    Data-driven Technology Foresight: Text Analysis of Emerging Technologies

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 공과대학 산업·조선공학부, 2018. 2. 박용태.This dissertation argues for new directions in the field of technology foresight. Technology foresight was formulated on the basis of qualitative and participatory research. Initially, most foresight activities were triggered by the prospect of a handful number of experts, but recent studies highlight theoretical paradigm shifts toward a more comprehensive and data-driven approach to creating shared insights on the future of emerging technologies. Much of the research up to now, however, has been descriptive in nature, and a definite method of realizing the notion has not yet been addressed in the existing literature to a large extent. To this end, we have attempted to formalize the concept of data-driven technology foresight by incorporating unconventional data sources – future-oriented web data, Wikipedia data, and scientific publication data – and different analytical tools – Latent Semantic Analysis, IdeaGraph, and Morphological Analysis. Four distinct foresight frameworks were proposed for the proactive management process of emerging technologies: impact identification, impact analysis, plan development, and technology ideation. The study was guided by the following research questions: (1) what kinds of data sources are available on the web and which of those are considered useful in foresight studies? (2) Where could we incorporate these data sources and which techniques are most suitable for the given purposes? (3) Which foresight-related fields would particularly benefit from applying a data-driven approach and what are the positive effects? The proposals outlined should be considered exploratory and open-ended. It is designed to determine the nature of the problem, rather than to offer definitive and conclusive answers. Nevertheless, the proposed scheme may well provide not just a rationale but a theoretical grounding for this newly introduced notion. This dissertation is expected to yield a foothold for the readers to better comprehend and act on this new shift in the field of technology foresight.Chapter 1 Introduction 1 1.1 Emergence of Technology Foresight 1 1.2 Towards a Data-driven Technology Foresight 3 1.3 Problem Statement 6 1.4 Dissertation Overview 8 Chapter 2 Data Sources and Methodologies 15 2.1 Data Sources 15 2.1.1 Future-oriented Web Data 15 2.1.2 Wikipedia Data 17 2.1.3 Scientific Publication Data 19 2.2 Methodologies 21 2.2.1 Latent Semantic Analysis (LSA) 21 2.2.2 IdeaGraph 25 2.2.3 Morphological Analysis (MA) 29 Chapter 3 Foresight for Impact Identification 31 3.1 Introduction 32 3.2 Emerging Technology and its Social Impacts 36 3.2.1 Distinctive Nature of Emerging Technology 36 3.2.2 Technology Assessment 39 3.3 LSA for Constructing Scenarios 43 3.4 Research Framework 44 3.4.1 Step 1: Data Collection 46 3.4.2 Step 2: Scenario Development 49 3.4.2.1 Pre-LSA: Preprocessing Future-oriented Web Data 49 3.4.2.2 LSA: Applying Latent Semantic Analysis 52 3.4.2.3 Post-LSA: Constructing Scenarios 54 3.5 Illustrative Case Study: Drone Technology 55 3.6 Discussion 65 3.6.1 Categorization of Social Impacts 65 3.6.2 Comparative Analysis 72 3.6.3 Implication for Theory, Practice, and Policy 74 3.7 Conclusion 76 Chapter 4 Foresight for Impact Analysis 79 4.1 Introduction 80 4.2 Uncertainty and Complexity 82 4.3 Data-driven Foresight Process 84 4.4 Scenario Building Beyond the Obvious 86 4.4.1 Capturing Plausibility using LSA 90 4.4.2 Capturing Creativity using IdeaGraph 92 4.5 Research Framework 93 4.5.1 Step 1. Pre-Analysis: Data Preparation 94 4.5.1.1 Target Technology Selection 94 4.5.1.2 Data Acquisition 95 4.5.1.3 Data Preprocessing 95 4.5.2 Step 2. Text Analysis: Scenario Building 96 4.5.2.1 General Glimpse using Overt Structures 96 4.5.2.2 Hidden Details using Latent Structures 98 4.5.3 Step 3. Post-Analysis: Analytical Interpretation 101 4.5.3.1 Individual Impact Scenario 101 4.5.3.2 Overall Latent Impacts 101 4.6 Illustrative Case Study: 3D Printing Technology 102 4.7 Discussion 110 4.7.1 Scenarios Beyond the Obvious 110 4.7.2 Comparative Analysis 113 4.8 Conclusion 115 Chapter 5 Foresight for Plan Development 117 5.1 Introduction 118 5.2 Theoretical Paradigm Shift 120 5.2.1 Technology-focused vs. Society-focused 120 5.2.2 Co-evolution of Technology and Society 122 5.2.3 Responsible Development 125 5.3 Methodological Paradigm Shift 127 5.3.1 Participatory Approach 127 5.3.2 Data-driven Approach 129 5.4 Rationale for using LSA 131 5.5 Research Framework 132 5.5.1 Step 1. Envisioning Social Issues 133 5.5.1.1 Collection of Future-oriented Web Data 133 5.5.1.2 Construction of Impact Scenarios 135 5.5.1.3 Conceptualization of Impact Scenarios 137 5.5.2 Step 2. Deriving Technical Solutions 138 5.5.2.1 Collection of Scientific Publication Data 138 5.5.2.2 Construction of Solution Concepts 139 5.6 Illustrative Case Study: Autonomous Vehicle 140 5.7 Discussion 149 5.7.1 Comparative Analysis 149 5.7.2 Major Strengths in Envisioning Social Impacts 152 5.7.3 Major Strengths in Overviewing Solutions 154 5.8 Conclusion 156 Chapter 6 Foresight for Technology Ideation 158 6.1 Introduction 159 6.2 Related Studies 161 6.2.1 Generating Creative Ideas 161 6.2.2 Data-driven Morphological Analysis 163 6.3 Technology Foresight using Wikipedia 165 6.3.1 Wikipedia as a Good Remedy 165 6.3.2 Preliminaries: How to Apply Wikipedia 168 6.4 Research Framework 173 6.4.1 Basic Model 174 6.4.2 Extended Model 175 6.4.2.1 Phase 1: Preliminary Phase 177 6.4.2.2 Phase 2: Dimension Development Phase 177 6.4.2.3 Phase 3: Value Development Phase 179 6.4.2.4 Phase 4: Sub-dimension Development Phase 182 6.5 Illustrative Case Study: Drone Technology 183 6.5.1 Basic Model 183 6.5.2 Extended Model 185 6.6 Comparative Analysis 193 6.6.1 Experimental Setup 193 6.6.2 Comparison of Results 195 6.7 Intrinsic Limitations of Applying Wikipedia 199 6.8 Conclusion 201 Chapter 7 Concluding Remarks 203 Bibliography 211 Appendix 236 Appendix A Result of overt and latent structures of each impact scenario 236 Appendix B Result of Wikipedia-based morphological matrix (basic model) 240 Appendix C Result of Wikipedia-based morphological matrix using superordinate seed terms (extended model) 241 Appendix D Result of Wikipedia-based morphological matrix after applying subordinate value seed terms (extended model) 243 Appendix E Result of Wikipedia-based morphological matrix after developing sub-dimensions (extended model) 247Docto

    User Interfaces for Personal Knowledge Management with Semantic Technologies

    Get PDF
    This thesis describes iMapping and QuiKey, two novel user interface concepts for dealing with structured information. iMapping is a visual knowledge mapping technique based on zooming, which combines the advantages of several existing approaches and scales up to very large maps. QuiKey is a text-based tool to interact with graph-structured knowledge bases with very high interaction efficiency. Both tools have been implemented and positively evaluated in user studies

    Addressing the new generation of spam (Spam 2.0) through Web usage models

    Get PDF
    New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications.The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method.This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering.This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach. All the proposed solutions are tested against real-world datasets and their performance is compared with that of existing Spam 2.0 filtering methods.This work has coined the term ‘Spam 2.0’, provided insight into the nature of Spam 2.0, and proposed filtering mechanisms to address this new and rapidly evolving problem

    Front Matter - Soft Computing for Data Mining Applications

    Get PDF
    Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic

    Proceedings of the Workshop on the Reuse of Web based Information

    Get PDF
    The proceedings are currently available online at: http://www-rocq.inria.fr/~vercoust/REUSE/WWW7-reuse.html where individual papers can be downloaded. However, this URL must not be regarded as permanent.These are the Proceeding of theWorkshop on the Reuse of Web Information that was held in conjunction with the Seventh International World Wide Web Conference, Brisbane, 14 April 19998
    corecore