7 research outputs found

    Search Engine Similarity Analysis: A Combined Content and Rankings Approach

    Full text link
    How different are search engines? The search engine wars are a favorite topic of on-line analysts, as two of the biggest companies in the world, Google and Microsoft, battle for prevalence of the web search space. Differences in search engine popularity can be explained by their effectiveness or other factors, such as familiarity with the most popular first engine, peer imitation, or force of habit. In this work we present a thorough analysis of the affinity of the two major search engines, Google and Bing, along with DuckDuckGo, which goes to great lengths to emphasize its privacy-friendly credentials. To do so, we collected search results using a comprehensive set of 300 unique queries for two time periods in 2016 and 2019, and developed a new similarity metric that leverages both the content and the ranking of search responses. We evaluated the characteristics of the metric against other metrics and approaches that have been proposed in the literature, and used it to (1) investigate the similarities of search engine results, (2) the evolution of their affinity over time, (3) what aspects of the results influence similarity, and (4) how the metric differs over different kinds of search services. We found that Google stands apart, but Bing and DuckDuckGo are largely indistinguishable from each other.Comment: Shorter version of this paper was accepted in the 21st International Conference on Web Information Systems Engineering (WISE 2020). The final authenticated version is available online at https://doi.org/10.1007/978-3-030-62008-0_

    A semantic framework for ontology usage analysis

    Get PDF
    The Semantic Web envisions a Web where information is accessible and processable by computers as well as humans. Ontologies are the cornerstones for realizing this vision of the Semantic Web by capturing domain knowledge by defining the terms and the relationship between these terms to provide a formal representation of the domain with machine-understandable semantics. Ontologies are used for semantic annotation, data interoperability and knowledge assimilation and dissemination.In the literature, different approaches have been proposed to build and evolve ontologies, but in addition to these, one more important concept needs to be considered in the ontology lifecycle, that is, its usage. Measuring the “usage” of ontologies will help us to effectively and efficiently make use of semantically annotated structured data published on the Web (formalized knowledge published on the Web), improve the state of ontology adoption and reusability, provide a usage-based feedback loop to the ontology maintenance process for a pragmatic conceptual model update, and source information accurately and automatically which can then be utilized in the other different areas of the ontology lifecycle. Ontology Usage Analysis is the area which evaluates, measures and analyses the use of ontologies on the Web. However, in spite of its importance, no formal approach is present in the literature which focuses on measuring the use of ontologies on the Web. This is in contrast to the approaches proposed in the literature on the other concepts of the ontology lifecycle, such as ontology development, ontology evaluation and ontology evolution. So, to address this gap, this thesis is an effort in such a direction to assess, analyse and represent the use of ontologies on the Web.In order to address the problem and realize the abovementioned benefits, an Ontology Usage Analysis Framework (OUSAF) is presented. The OUSAF Framework implements a methodological approach which is comprised of identification, investigation, representation and utilization phases. These phases provide a complete solution for usage analysis by allowing users to identify the key ontologies, and investigate, represent and utilize usage analysis results. Various computation components with several methods, techniques, and metrics for each phase are presented and evaluated using the Semantic Web data crawled from the Web. For the dissemination of ontology-usage-related information accessible to machines and humans, The U Ontology is presented to formalize the conceptual model of the ontology usage domain. The evaluation of the framework, solution components, methods, and a formalized conceptual model is presented, indicating the usefulness of the overall proposed solution

    Ekstraksi Informasi Media Sosial Twitter Mengenai Gangguan Keamanan Menggunakan Pendekatan Ontology-Based Information Extraction (Studi Kasus: PT. Pertamina (Persero))

    Get PDF
    Pemberitaan pada masa kini sudah beralih dari media massa konvensional menuju media teknologi informasi seperti media sosial Twitter, di Indonesia jumlah pengguna Twitter mencapai sekitar 49% dari total 130 juta pengguna media sosial di Indonesia. Namun dari pemberitaan yang beredar di media sosial, masih belum pemetaan atau pengolahan data tersebut. Oleh karena itu penelitian ini bertujuan untuk memetakan dan mengolah tweet pemberitaan, khususnya mengenai kasus kejahatan atau gangguan keamanan untuk membuat informasi yang bermanfaat dan membantu memberi masukan kepada PT. Pertamina (Persero). Metodologi yang digunakan dalam penelitian adalah Ontologi untuk ekstraksi informasi pada data yang sudah di crawling berdasarkan kategori yang sudah ditentukan dan Named-Entity Recognition. Metode Named-Entity Recognition digunakan untuk mengkategorikan data per tweet ke dalam kategori seperti aktor, lokasi, keterangan, time. Hasil yang didapatkan adalah metode Named Entity Recognition dalam pembuatan model dapat menghasilkan ekstraksi informasi yang cukup akurat, serta penggunaan Ontologi mampu mengkategorikan tipe kejahatan/gangguan keamanan. Nilai akurasi yang didapatkan oleh model aktor sebesar 90.65% untuk precision, 90.82% untuk recall, 90.74% untuk f1 score. Nilai akurasi yang didapatkan oleh model lokasi sebesar 99.54% untuk precision, 98.37% untuk recall, 98.95% untuk f1 score. Nilai akurasi yang didapatkan oleh model keterangan sebesar 95.86% untuk precision, 99.75% untuk recall, 97.77% untuk f1 score. Nilai akurasi yang didapatkan oleh model time sebesar 99.99% untuk precision, 100% untuk recall, 100% untuk f1 score. ================================================================================================================================= The mass media nowadays has been switched from conventional mass media to technology-based media such as Twitter. The number of Twitter users in Indonesia reaches around 49% of the total 130 million social media users in Indonesia. But from the news circulating on social media, it still hasn't mapped or processed the data. Therefore, this study aims to map and process news tweets, especially regarding crime or security disturbances cases to make useful information and help provide input to PT. Pertamina (Persero). The methodology used in research is Ontology for extracting data on crawled data based on predetermined categories and Named-Entity Recognition. The Named-Entity Recognition method is used to categorize data per tweet into categories such as actor, location, causes, time. The Named-Entity Recognition method is used to categorize data per tweet into categories such as actor, location, description, time. The results obtained are that the Named Entity Recognition method in modeling can produce fairly accurate information extraction, and the use of Ontology is able to categorize the types of crime / security disturbances. The value of accuracy obtained by the actor model is 90.65% for precision, 90.82% for recall, 90.74% for score f1. The accuracy value obtained by the location model is 99.54% for precision, 98.37% for recall, 98.95% for score f1. The causes value obtained by the answer model is 95.86% for precision, 99.75% for recall, 97.77% for score f1. The value obtained by the time model is 99.99% for precision, 100% for recall, 100% for score f1

    WEB recommendations for E-commerce websites

    Get PDF
    In this part of the thesis we have investigated how the navigation utilizing web recommendations can be implemented on the e-commerce websites based on integrated data sources. The integrated e-commerce websites are an interesting use case for web recommendations. One of the reasons for this interest is that many modern, large and economically successful e-commerce websites follow the integrated approach. Another reason is that especially in the integrated environment, due to the lack of the pre-defined semantic connections between the data, the web recommendations step forward as means of enabling user navigation. In this chapter we have presented the architecture for the websites based on integrated data sources named EC-Fuice. We have also presented the prototypical implementation of our architecture which serves as a proof-of-concept and investigated the challenges of creating navigation on an integrated website. The following issues were addressed in this part of the thesis: Combination of several state-of-the-art tools and techniques in the fields of databases, data integration, ontology matching and web engineering into one generic architecture for creating integrated websites. Comparative experiments with several techniques for instance matching (also known as record linkage or duplicate detection). Investigation on using the ontology matching to facilitate the instance matching. Comparative experiments with several techniques for ontology matching. Investigations on the instance-based ontology matching and the possibilities for combining instance-based ontology matching with other techniques for ontology matching. Investigation of the possibilities to improve user navigation in the integrated data environment with different types of web recommendations. Review of the related work in the fields of data integration and ontology matching and discussion of the contact points between the research described here and other related projects. The main contributions of the research described in this part of the thesis are the EC-Fuice architecture, the novel method for matching e-commerce ontologies based on combination of instance information and metadata information, the experimental results of ontology and instance matching performed by different matching algorithms and the classification of the types of recommendations which can be used on an integrated e-commerce website
    corecore