198 research outputs found

    Application of the Markov Chain Method in a Health Portal Recommendation System

    Get PDF
    This study produced a recommendation system that can effectively recommend items on a health portal. Toward this aim, a transaction log that records users’ traversal activities on the Medical College of Wisconsin’s HealthLink, a health portal with a subject directory, was utilized and investigated. This study proposed a mixed-method that included the transaction log analysis method, the Markov chain analysis method, and the inferential analysis method. The transaction log analysis method was applied to extract users’ traversal activities from the log. The Markov chain analysis method was adopted to model users’ traversal activities and then generate recommendation lists for topics, articles, and Q&A items on the health portal. The inferential analysis method was applied to test whether there are any correlations between recommendation lists generated by the proposed recommendation system and recommendation lists ranked by experts. The topics selected for this study are Infections, the Heart, and Cancer. These three topics were the three most viewed topics in the portal. The findings of this study revealed the consistency between the recommendation lists generated from the proposed system and the lists ranked by experts. At the topic level, two topic recommendation lists generated from the proposed system were consistent with the lists ranked by experts, while one topic recommendation list was highly consistent with the list ranked by experts. At the article level, one article recommendation list generated from the proposed system was consistent with the list ranked by experts, while 14 article recommendation lists were highly consistent with the lists ranked by experts. At the Q&A item level, three Q&A item recommendation lists generated from the proposed system were consistent with the lists ranked by experts, while 12 Q&A item recommendation lists were highly consistent with the lists ranked by experts. The findings demonstrated the significance of users’ traversal data extracted from the transaction log. The methodology applied in this study proposed a systematic approach to generating the recommendation systems for other similar portals. The outcomes of this study can facilitate users’ navigation, and provide a new method for building a recommendation system that recommends items at three levels: the topic level, the article level, and the Q&A item level

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Timeout Reached, Session Ends?

    Get PDF
    Die Identifikation von Sessions zum Verständnis des Benutzerverhaltens ist ein Forschungsgebiet des Web Usage Mining. Definitionen und Konzepte werden seit über 20 Jahren diskutiert. Die Forschung zeigt, dass Session-Identifizierung kein willkürlicher Prozess sein sollte. Es gibt eine fragwürdige Tendenz zu vereinfachten mechanischen Sessions anstelle logischer Segmentierungen. Ziel der Dissertation ist es zu beweisen, wie unterschiedliche Session-Ansätze zu abweichenden Ergebnissen und Interpretationen führen. Die übergreifende Forschungsfrage lautet: Werden sich verschiedene Ansätze zur Session-Identifizierung auf Analyseergebnisse und Machine-Learning-Probleme auswirken? Ein methodischer Rahmen für die Durchführung, den Vergleich und die Evaluation von Sessions wird gegeben. Die Dissertation implementiert 135 Session-Ansätze in einem Jahr (2018) Daten einer deutschen Preisvergleichs-E-Commerce-Plattform. Die Umsetzung umfasst mechanische Konzepte, logische Konstrukte und die Kombination mehrerer Mechaniken. Es wird gezeigt, wie logische Sessions durch Embedding-Algorithmen aus Benutzersequenzen konstruiert werden: mit einem neuartigen Ansatz zur Identifizierung logischer Sessions, bei dem die thematische Nähe von Interaktionen anstelle von Suchanfragen allein verwendet wird. Alle Ansätze werden verglichen und quantitativ beschrieben sowie in drei Machine-Learning-Problemen (wie Recommendation) angewendet. Der Hauptbeitrag dieser Dissertation besteht darin, einen umfassenden Vergleich von Session-Identifikationsalgorithmen bereitzustellen. Die Arbeit bietet eine Methodik zum Implementieren, Analysieren und Evaluieren einer Auswahl von Mechaniken, die es ermöglichen, das Benutzerverhalten und die Auswirkungen von Session-Modellierung besser zu verstehen. Die Ergebnisse zeigen, dass unterschiedlich strukturierte Eingabedaten die Ergebnisse von Algorithmen oder Analysen drastisch verändern können.The identification of sessions as a means of understanding user behaviour is a common research area of web usage mining. Different definitions and concepts have been discussed for over 20 years: Research shows that session identification is not an arbitrary task. There is a tendency towards simplistic mechanical sessions instead of more complex logical segmentations, which is questionable. This dissertation aims to prove how the nature of differing session-identification approaches leads to diverging results and interpretations. The overarching research question asks: will different session-identification approaches impact analysis and machine learning tasks? A comprehensive methodological framework for implementing, comparing and evaluating sessions is given. The dissertation provides implementation guidelines for 135 session-identification approaches utilizing a complete year (2018) of traffic data from a German price-comparison e-commerce platform. The implementation includes mechanical concepts, logical constructs and the combination of multiple methods. It shows how logical sessions were constructed from user sequences by employing embedding algorithms on interaction logs; taking a novel approach to logical session identification by utilizing topical proximity of interactions instead of search queries alone. All approaches are compared and quantitatively described. The application in three machine-learning tasks (such as recommendation) is intended to show that using different sessions as input data has a marked impact on the outcome. The main contribution of this dissertation is to provide a comprehensive comparison of session-identification algorithms. The research provides a methodology to implement, analyse and compare a wide variety of mechanics, allowing to better understand user behaviour and the effects of session modelling. The main results show that differently structured input data may drastically change the results of algorithms or analysis

    Understanding Google: Search Engines and the Changing Nature of Access, Thought and Knowledge within a Global Context

    Get PDF
    This thesis explores the impact of search engines within contemporary digital culture and, in particular, focuses on the social, cultural, and philosophical influence of Google. Search engines are deeply enmeshed with other recent developments in digital culture; therefore, in addressing their impact these intersections must be recognised, while highlighting the technological and social specificity of search engines. Also important is acknowledging the way that certain institutions, in particular Google, have shaped the web and wider culture around a particular set of economic incentives that have far-reaching consequences for contemporary digital culture. This thesis argues that to understand search engines requires a recognition of its contemporary context, while also acknowledging that Google’s quest to “organize the world's information and make it universally accessible and useful” is part of a much older and broader discourse. Balancing these two viewpoints is important; Google is shaping public discourse on a global scale with unprecedentedly extensive consequences. However, many of the issues addressed by this thesis would remain centrally important even if Google declared bankruptcy or if search engines were abandoned for a different technology. Search engines are a specific technological response to a particular cultural environment; however, their social function and technical operation are embedded within a historical relationship to enquiry and inscription that stretches back to antiquity. This thesis addresses the following broad research questions, while at each stage specifically addressing the role and influence of search engines: how do individuals interrogate and navigate the world around them? How do technologies and social institutions facilitate how we think and remember? How culturally situated is knowledge; are there epistemological truths that transcend social environments? How does technological expansion fit within wider questions of globalisation? How do technological discourses shape the global flows of information and capital? These five questions map directly onto the five chapters of this thesis. Much of the existing study of search engines has been focused on small-scale evaluation, which either addresses Google’s day-by-day algorithmic changes or poses relatively isolated disciplinary questions. Therefore, not only is the number of academics, technicians, and journalists attending to search engines relatively small, given the centrality of search engines to digital culture, but much of the knowledge that is produced becomes outdated with algorithmic changes or the shifting strategies of companies. This thesis ties these focused concerns to wider issues, with a view to encourage and facilitate further enquiry.This thesis explores the impact of Google’s search engine within contemporary digital culture. Search engines have been studied in various disciplines, for example information retrieval, computer science, law, and new media, yet much of this work remains fixed within disciplinary boundaries. The approach of this thesis is to draw on work from a number of areas in order to link a technical understanding of how search engines function with a wider cultural and philosophical context. In particular, this thesis draws on critical theory in order to attend to the convergence of language, programming, and culture on a global scale. The chapter outline is as follows. Chapter one compares search engine queries to traditional questions. The chapter draws from information retrieval research to provide a technical framework that is brought into contact with philosophy and critical theory, including Plato and Hans-Georg Gadamer. Chapter two investigates search engines as memory aids, deploying a history of memory and exploring practices within oral cultures and mnemonic techniques such as the Ars Memoria. This places search engines within a longer historical context, while drawing on contemporary insights from the philosophy and science of cognition. Chapter three addresses Google’s Autocomplete functionality and chapter four explores the contextual nature of results in order to highlight how different characteristics of users are used to personalise access to the web. These chapters address Google’s role within a global context and the implications for identity and community online. Finally, chapter five explores how Google’s method of generating revenue, through advertising, has a social impact on the web as a whole, particularly when considered through the lens of contemporary Post-Fordist accounts of capitalism. Throughout, this thesis develops a framework for attending to algorithmic cultures and outlines the specific influence that Google has had on the web and continues to have at a global scale.Arts and Humanities Research Counci

    Improving document representation by accumulating relevance feedback : the relevance feedback accumulation (RFA) algorithm

    Get PDF
    Document representation (indexing) techniques are dominated by variants of the term-frequency analysis approach, based on the assumption that the more occurrences a term has throughout a document the more important the term is in that document. Inherent drawbacks associated with this approach include: poor index quality, high document representation size and the word mismatch problem. To tackle these drawbacks, a document representation improvement method called the Relevance Feedback Accumulation (RFA) algorithm is presented. The algorithm provides a mechanism to continuously accumulate relevance assessments over time and across users. It also provides a document representation modification function, or document representation learning function that gradually improves the quality of the document representations. To improve document representations, the learning function uses a data mining measure called support for analyzing the accumulated relevance feedback. Evaluation is done by comparing the RFA algorithm to other four algorithms. The four measures used for evaluation are (a) average number of index terms per document; (b) the quality of the document representations assessed by human judges; (c) retrieval effectiveness; and (d) the quality of the document representation learning function. The evaluation results show that (1) the algorithm is able to substantially reduce the document representations size while maintaining retrieval effectiveness parameters; (2) the algorithm provides a smooth and steady document representation learning function; and (3) the algorithm improves the quality of the document representations. The RFA algorithm\u27s approach is consistent with efficiency considerations that hold in real information retrieval systems. The major contribution made by this research is the design and implementation of a novel, simple, efficient, and scalable technique for document representation improvement

    Accelerating scientific research in the digital era: intelligent assessment and retrieval of research content

    Get PDF
    The efficient, effective, and timely access to the scientific literature by researchers is crucial for accelerating scientific research and discovery. Nowadays, research articles are almost exclusively published in a digital form and stored in digital libraries, accessible over the Web. Using digital libraries for storing scientific literature is advantageous as it enables access to articles at any time and place. Furthermore, digital libraries can leverage information management systems and artificial intelligence techniques to manage, retrieve, and analyze research content. Due to the large size of those libraries and their fast growth pace, the development of intelligent systems that can effectively retrieve and analyze research content is crucial for improving the productivity of researchers. In this thesis, we focus on improving literature search engines by addressing some of their limitations. One of the limitations of the current literature search engines is that they mainly treat articles as the retrieval units and do not support the direct search for any of the article's elements such as figures, tables, and formulas. In this thesis, we study how to enable researchers to access research collections using figures of articles. Figures are entities in research articles that play an essential role in scientific communications. For this reason, research figures can be utilized directly by literature systems to facilitate and accelerate research. As the first step in this direction, we propose and study the novel task of figure retrieval from collections of research articles where the goal is to retrieve research article figures using keyword queries. We focus on the textual bag-of-words representation of search queries and figures and study the effectiveness of different retrieval models for the task and various ways to represent figures using text data. The empirical study shows the benefit of using multiple textual inputs for representing a figure and combining different retrieval models. The results also shed light on the different challenges in addressing this novel task. Next, we address the limitations of the text-based bag-of-words representation of research figures by proposing and studying a new view of representation, namely deep neural network-based distributed representations. Specifically, we focus on using image data and text for learning figure representations with different model architectures and loss functions to understand how sensitive the embeddings are to the learning approach and the features used. We also develop a novel weak supervision technique for training neural networks for this task that leverages the citation network of articles to generate large quantities of training examples. The experimental results show that figure representations, learned using our weak supervision approach, are effective and outperform representations of the bag-of-words technique and pre-trained neural networks. The current systems also have minimal support for addressing queries for which a search engine performs poorly due to ineffective formulation by the user. When conducting research, poor-performing search queries may occur when a researcher faces a new or fast-evolving research topic, resulting in a significant vocabulary gap between the user's query and the relevant articles. In this thesis, we address this problem by developing a novel strategy for collaborative query construction. According to this strategy, the search engine would actively engage users in an iterative process to continuously revise a query. We propose a specific implementation of this strategy in which the search engine and the user work together to expand a search query. Specifically, the system generates expansion terms, utilizing the history of interactions of the user with it, that the user can add to the search query in every iteration to reach an "ideal query". The experimental results attest to the effectiveness of using this approach in improving poor-performing search queries with minimal effort from the user. The last limitation that we address in this thesis is that the current systems usually do not leverage any content analysis for the quality assessment of articles and instead rely on citation counts. In this thesis, we study the task of automatic quality assessment of research articles where the goal is to assess the quality of an article in different aspects such as clarity, originality, and soundness. Automating the quality assessment of articles could improve the current literature systems that can leverage the generated quality scores to support the search and analysis of research articles. Previous works have applied supervised machine learning to automate the assessment by learning from examples of reviewed articles by humans. In this thesis, we study the effectiveness of using topics for the task and propose a novel strategy for constructing multi-view topical features. Experimental results show that such features are effective for this task compared to deep neural network-based features and bag-of-words features. Finally, to facilitate further evaluation of the different approaches suggested in this thesis using real users and realistic user tasks, we developed AcademicExplorer, a novel general system that supports the retrieval and exploration of research articles using several new functions enabled by the proposed algorithms in this thesis, such as exploring research collections using figure embeddings, sorting research articles based on automatically generated review scores, and interactive query formulation. As an open-source system, AcademicExplorer can help advance the research, evaluation, and development of applications in this area

    Analyzing and Applying Cryptographic Mechanisms to Protect Privacy in Applications

    Get PDF
    Privacy-Enhancing Technologies (PETs) emerged as a technology-based response to the increased collection and storage of data as well as the associated threats to individuals' privacy in modern applications. They rely on a variety of cryptographic mechanisms that allow to perform some computation without directly obtaining knowledge of plaintext information. However, many challenges have so far prevented effective real-world usage in many existing applications. For one, some mechanisms leak some information or have been proposed outside of security models established within the cryptographic community, leaving open how effective they are at protecting privacy in various applications. Additionally, a major challenge causing PETs to remain largely academic is their practicality-in both efficiency and usability. Cryptographic mechanisms introduce a lot of overhead, which is mostly prohibitive, and due to a lack of high-level tools are very hard to integrate for outsiders. In this thesis, we move towards making PETs more effective and practical in protecting privacy in numerous applications. We take a two-sided approach of first analyzing the effective security (cryptanalysis) of candidate mechanisms and then building constructions and tools (cryptographic engineering) for practical use in specified emerging applications in the domain of machine learning crucial to modern use cases. In the process, we incorporate an interdisciplinary perspective for analyzing mechanisms and by collaboratively building privacy-preserving architectures with requirements from the application domains' experts. Cryptanalysis. While mechanisms like Homomorphic Encryption (HE) or Secure Multi-Party Computation (SMPC) provably leak no additional information, Encrypted Search Algorithms (ESAs) and Randomization-only Two-Party Computation (RoTPC) possess additional properties that require cryptanalysis to determine effective privacy protection. ESAs allow for search on encrypted data, an important functionality in many applications. Most efficient ESAs possess some form of well-defined information leakage, which is cryptanalyzed via a breadth of so-called leakage attacks proposed in the literature. However, it is difficult to assess their practical effectiveness given that previous evaluations were closed-source, used restricted data, and made assumptions about (among others) the query distribution because real-world query data is very hard to find. For these reasons, we re-implement known leakage attacks in an open-source framework and perform a systematic empirical re-evaluation of them using a variety of new data sources that, for the first time, contain real-world query data. We obtain many more complete and novel results where attacks work much better or much worse than what was expected based on previous evaluations. RoTPC mechanisms require cryptanalysis as they do not rely on established techniques and security models, instead obfuscating messages using only randomizations. A prominent protocol is a privacy-preserving scalar product protocol by Lu et al. (IEEE TPDS'13). We show that this protocol is formally insecure and that this translates to practical insecurity by presenting attacks that even allow to test for certain inputs, making the case for more scrutiny of RoTPC protocols used as PETs. This part of the thesis is based on the following two publications: [KKM+22] S. KAMARA, A. KATI, T. MOATAZ, T. SCHNEIDER, A. TREIBER, M. YONLI. “SoK: Cryptanalysis of Encrypted Search with LEAKER - A framework for LEakage AttacK Evaluation on Real-world data”. In: 7th IEEE European Symposium on Security and Privacy (EuroS&P’22). Full version: https://ia.cr/2021/1035. Code: https://encrypto.de/code/LEAKER. IEEE, 2022, pp. 90–108. Appendix A. [ST20] T. SCHNEIDER , A. TREIBER. “A Comment on Privacy-Preserving Scalar Product Protocols as proposed in “SPOC””. In: IEEE Transactions on Parallel and Distributed Systems (TPDS) 31.3 (2020). Full version: https://arxiv.org/abs/1906.04862. Code: https://encrypto.de/code/SPOCattack, pp. 543–546. CORE Rank A*. Appendix B. Cryptographic Engineering. Given the above results about cryptanalysis, we investigate using the leakage-free and provably-secure cryptographic mechanisms of HE and SMPC to protect privacy in machine learning applications. As much of the cryptographic community has focused on PETs for neural network applications, we focus on two other important applications and models: Speaker recognition and sum product networks. We particularly show the efficiency of our solutions in possible real-world scenarios and provide tools usable for non-domain experts. In speaker recognition, a user's voice data is matched with reference data stored at the service provider. Using HE and SMPC, we build the first privacy-preserving speaker recognition system that includes the state-of-the-art technique of cohort score normalization using cohort pruning via SMPC. Then, we build a privacy-preserving speaker recognition system relying solely on SMPC, which we show outperforms previous solutions based on HE by a factor of up to 4000x. We show that both our solutions comply with specific standards for biometric information protection and, thus, are effective and practical PETs for speaker recognition. Sum Product Networks (SPNs) are noteworthy probabilistic graphical models that-like neural networks-also need efficient methods for privacy-preserving inference as a PET. We present CryptoSPN, which uses SMPC for privacy-preserving inference of SPNs that (due to a combination of machine learning and cryptographic techniques and contrary to most works on neural networks) even hides the network structure. Our implementation is integrated into the prominent SPN framework SPFlow and evaluates medium-sized SPNs within seconds. This part of the thesis is based on the following three publications: [NPT+19] A. NAUTSCH, J. PATINO, A. TREIBER, T. STAFYLAKIS, P. MIZERA, M. TODISCO, T. SCHNEIDER, N. EVANS. Privacy-Preserving Speaker Recognition with Cohort Score Normalisation”. In: 20th Conference of the International Speech Communication Association (INTERSPEECH’19). Online: https://arxiv.org/abs/1907.03454. International Speech Communication Association (ISCA), 2019, pp. 2868–2872. CORE Rank A. Appendix C. [TNK+19] A. TREIBER, A. NAUTSCH , J. KOLBERG , T. SCHNEIDER , C. BUSCH. “Privacy-Preserving PLDA Speaker Verification using Outsourced Secure Computation”. In: Speech Communication 114 (2019). Online: https://encrypto.de/papers/TNKSB19.pdf. Code: https://encrypto.de/code/PrivateASV, pp. 60–71. CORE Rank B. Appendix D. [TMW+20] A. TREIBER , A. MOLINA , C. WEINERT , T. SCHNEIDER , K. KERSTING. “CryptoSPN: Privacy-preserving Sum-Product Network Inference”. In: 24th European Conference on Artificial Intelligence (ECAI’20). Full version: https://arxiv.org/abs/2002.00801. Code: https://encrypto.de/code/CryptoSPN. IOS Press, 2020, pp. 1946–1953. CORE Rank A. Appendix E. Overall, this thesis contributes to a broader security analysis of cryptographic mechanisms and new systems and tools to effectively protect privacy in various sought-after applications

    An overview of artificial intelligence and robotics. Volume 1: Artificial intelligence. Part B: Applications

    Get PDF
    Artificial Intelligence (AI) is an emerging technology that has recently attracted considerable attention. Many applications are now under development. This report, Part B of a three part report on AI, presents overviews of the key application areas: Expert Systems, Computer Vision, Natural Language Processing, Speech Interfaces, and Problem Solving and Planning. The basic approaches to such systems, the state-of-the-art, existing systems and future trends and expectations are covered

    User modeling servers - requirements, design, and evaluation

    Get PDF
    Softwaresysteme, die ihre Services an Charakteristika individueller Benutzer anpassen haben sich bereits als effektiver und/oder benutzerfreundlicher als statische Systeme in mehreren Anwendungsdomänen erwiesen. Um solche Anpassungsleistungen anbieten zu können, greifen benutzeradaptive Systeme auf Modelle von Benutzercharakteristika zurück. Der Aufbau und die Verwaltung dieser Modelle wird durch dezidierte Benutzermodellierungskomponenten vorgenommen. Ein wichtiger Zweig der Benutzermodellierungsforschung beschäftigt sich mit der Entwicklung sogenannter ?Benutzermodellierungs-Shells?, d.h. generischen Benutzermodellierungssystemen, die die Entwicklung anwendungsspezifischer Benutzermodellierungskomponenten erleichtern. Die Bestimmung des Leistungsumfangs dieser generischen Benutzermodellierungssysteme und deren Dienste bzw. Funktionalitäten wurde bisher in den meisten Fällen intuitiv vorgenommen und/oder aus Beschreibungen weniger benutzeradaptiver Systeme in der Literatur abgeleitet. In der jüngeren Vergangenheit führte der Trend zur Personalisierung im World Wide Web zur Entwicklung mehrerer kommerzieller Benutzermodellierungsserver. Die für diese Systeme als wichtig erachteten Eigenschaften stehen im krassen Gegensatz zu denen, die bei der Entwicklung der Benutzermodellierungs-Shells im Vordergrund standen und umgekehrt. Vor diesem Hintergrund ist das Ziel dieser Dissertation (i) Anforderungen an Benutzermodellierungsserver aus einer multi-disziplinären wissenschaftlichen und einer einsatzorientierten (kommerziellen) Perspektive zu analysieren, (ii) einen Server zu entwerfen und zu implementieren, der diesen Anforderungen genügt, und (iii) die Performanz und Skalierbarkeit dieses Servers unter der Arbeitslast kleinerer und mittlerer Einsatzumgebungen gegen die diesbezüglichen Anforderungen zu überprüfen. Um dieses Ziel zu erreichen, verfolgen wir einen anforderungszentrierten Ansatz, der auf Erfahrungen aus verschiedenen Forschungsbereichen aufbaut. Wir entwickeln eine generische Architektur für einen Benutzermodellierungsserver, die aus einem Serverkern für das Datenmanagement und modular hinzufügbaren Benutzermodellierungskomponenten besteht, von denen jede eine wichtige Benutzermodellierungstechnik implementiert. Wir zeigen, dass wir durch die Integration dieser Benutzermodellierungskomponenten in einem Server Synergieeffekte zwischen den eingesetzten Lerntechniken erzielen und bekannte Defizite einzelner Verfahren kompensieren können, beispielsweise bezüglich Performanz, Skalierbarkeit, Integration von Domänenwissen, Datenmangel und Kaltstart. Abschließend präsentieren wir die wichtigsten Ergebnisse der Experimente, die wir durchgeführt haben um empirisch nachzuweisen, dass der von uns entwickelte Benutzermodellierungsserver zentralen Performanz- und Skalierbarkeitskriterien genügt. Wir zeigen, dass unser Benutzermodellierungsserver die vorbesagten Kriterien in Anwendungsumgebungen mit kleiner und mittlerer Arbeitslast in vollem Umfang erfüllt. Ein Test in einer Anwendungsumgebung mit mehreren Millionen Benutzerprofilen und einer Arbeitslast, die als repräsentativ für größere Web Sites angesehen werden kann bestätigte, dass die Performanz der Benutzermodellierung unseres Servers keine signifikante Mehrbelastung für eine personalisierte Web Site darstellt. Gleichzeitig können die Anforderungen an die verfügbare Hardware als moderat eingestuft werden
    • …
    corecore