32 research outputs found

    効率的で安全な集合間類似結合に関する研究

    Get PDF
    筑波大学 (University of Tsukuba)201

    Modélisation des comportements de recherche basé sur les interactions des utilisateurs

    Get PDF
    Les utilisateurs de systèmes d'information divisent normalement les tâches en une séquence de plusieurs étapes pour les résoudre. En particulier, les utilisateurs divisent les tâches de recherche en séquences de requêtes, en interagissant avec les systèmes de recherche pour mener à bien le processus de recherche d'informations. Les interactions des utilisateurs sont enregistrées dans des journaux de requêtes, ce qui permet de développer des modèles pour apprendre automatiquement les comportements de recherche à partir des interactions des utilisateurs avec les systèmes de recherche. Ces modèles sont à la base de multiples applications d'assistance aux utilisateurs qui aident les systèmes de recherche à être plus interactifs, faciles à utiliser, et cohérents. Par conséquent, nous proposons les contributions suivantes : un modèle neuronale pour apprendre à détecter les limites des tâches de recherche dans les journaux de requête ; une architecture de regroupement profond récurrent qui apprend simultanément les représentations de requête et regroupe les requêtes en tâches de recherche ; un modèle non supervisé et indépendant d'utilisateur pour l'identification des tâches de recherche prenant en charge les requêtes dans seize langues ; et un modèle de tâche de recherche multilingue, une approche non supervisée qui modélise simultanément l'intention de recherche de l'utilisateur et les tâches de recherche. Les modèles proposés améliorent les méthodes existantes de modélisation, en tenant compte de la confidentialité des utilisateurs, des réponses en temps réel et de l'accessibilité linguistique. Le respect de la vie privée de l'utilisateur est une préoccupation majeure, tandis que des réponses rapides sont essentielles pour les systèmes de recherche qui interagissent avec les utilisateurs en temps réel, en particulier dans la recherche par conversation. Dans le même temps, l'accessibilité linguistique est essentielle pour aider les utilisateurs du monde entier, qui interagissent avec les systèmes de recherche dans de nombreuses langues. Les contributions proposées peuvent bénéficier à de nombreuses applications d'assistance aux utilisateurs, en aidant ces derniers à mieux résoudre leurs tâches de recherche lorsqu'ils accèdent aux systèmes de recherche pour répondre à leurs besoins d'information.Users of information systems normally divide tasks in a sequence of multiple steps to solve them. In particular, users divide search tasks into sequences of queries, interacting with search systems to carry out the information seeking process. User interactions are registered on search query logs, enabling the development of models to automatically learn search patterns from the users' interactions with search systems. These models underpin multiple user assisting applications that help search systems to be more interactive, user-friendly, and coherent. User assisting applications include query suggestion, the ranking of search results based on tasks, query reformulation analysis, e-commerce applications, retrieval of advertisement, query-term prediction, mapping of queries to search tasks, and so on. Consequently, we propose the following contributions: a neural model for learning to detect search task boundaries in query logs; a recurrent deep clustering architecture that simultaneously learns query representations through self-training, and cluster queries into groups of search tasks; Multilingual Graph-Based Clustering, an unsupervised, user-agnostic model for search task identification supporting queries in sixteen languages; and Language-agnostic Search Task Model, an unsupervised approach that simultaneously models user search intent and search tasks. Proposed models improve on existing methods for modeling user interactions, taking into account user privacy, realtime response times, and language accessibility. User privacy is a major concern in Ethics for intelligent systems, while fast responses are critical for search systems interacting with users in realtime, particularly in conversational search. At the same time, language accessibility is essential to assist users worldwide, who interact with search systems in many languages. The proposed contributions can benefit many user assisting applications, helping users to better solve their search tasks when accessing search systems to fulfill their information needs

    Machine learning and privacy preserving algorithms for spatial and temporal sensing

    Get PDF
    Sensing physical and social environments are ubiquitous in modern mobile phones, IoT devices, and infrastructure-based settings. Information engraved in such data, especially the time and location attributes have unprecedented potential to characterize individual and crowd behaviour, natural and technological processes. However, it is challenging to extract abstract knowledge from the data due to its massive size, sequential structure, asynchronous operation, noisy characteristics, privacy concerns, and real time analysis requirements. Therefore, the primary goal of this thesis is to propose theoretically grounded and practically useful algorithms to learn from location and time stamps in sensor data. The proposed methods are inspired by tools from geometry, topology, and statistics. They leverage structures in the temporal and spatial data by probabilistically modeling noise, exploring topological structures embedded, and utilizing statistical structure to protect personal information and simultaneously learn aggregate information. Proposed algorithms are geared towards streaming and distributed operation for efficiency. The usefulness of the methods is argued using mathematical analysis and empirical experiments on real and artificial datasets

    Privacy preserving content analysis, indexing and retrieval for social search applications

    Get PDF
    [no abstract

    Secure and Efficient Comparisons between Untrusted Parties

    Get PDF
    A vast number of online services is based on users contributing their personal information. Examples are manifold, including social networks, electronic commerce, sharing websites, lodging platforms, and genealogy. In all cases user privacy depends on a collective trust upon all involved intermediaries, like service providers, operators, administrators or even help desk staff. A single adversarial party in the whole chain of trust voids user privacy. Even more, the number of intermediaries is ever growing. Thus, user privacy must be preserved at every time and stage, independent of the intrinsic goals any involved party. Furthermore, next to these new services, traditional offline analytic systems are replaced by online services run in large data centers. Centralized processing of electronic medical records, genomic data or other health-related information is anticipated due to advances in medical research, better analytic results based on large amounts of medical information and lowered costs. In these scenarios privacy is of utmost concern due to the large amount of personal information contained within the centralized data. We focus on the challenge of privacy-preserving processing on genomic data, specifically comparing genomic sequences. The problem that arises is how to efficiently compare private sequences of two parties while preserving confidentiality of the compared data. It follows that the privacy of the data owner must be preserved, which means that as little information as possible must be leaked to any party participating in the comparison. Leakage can happen at several points during a comparison. The secured inputs for the comparing party might leak some information about the original input, or the output might leak information about the inputs. In the latter case, results of several comparisons can be combined to infer information about the confidential input of the party under observation. Genomic sequences serve as a use-case, but the proposed solutions are more general and can be applied to the generic field of privacy-preserving comparison of sequences. The solution should be efficient such that performing a comparison yields runtimes linear in the length of the input sequences and thus producing acceptable costs for a typical use-case. To tackle the problem of efficient, privacy-preserving sequence comparisons, we propose a framework consisting of three main parts. a) The basic protocol presents an efficient sequence comparison algorithm, which transforms a sequence into a set representation, allowing to approximate distance measures over input sequences using distance measures over sets. The sets are then represented by an efficient data structure - the Bloom filter -, which allows evaluation of certain set operations without storing the actual elements of the possibly large set. This representation yields low distortion for comparing similar sequences. Operations upon the set representation are carried out using efficient, partially homomorphic cryptographic systems for data confidentiality of the inputs. The output can be adjusted to either return the actual approximated distance or the result of an in-range check of the approximated distance. b) Building upon this efficient basic protocol we introduce the first mechanism to reduce the success of inference attacks by detecting and rejecting similar queries in a privacy-preserving way. This is achieved by generating generalized commitments for inputs. This generalization is done by treating inputs as messages received from a noise channel, upon which error-correction from coding theory is applied. This way similar inputs are defined as inputs having a hamming distance of their generalized inputs below a certain predefined threshold. We present a protocol to perform a zero-knowledge proof to assess if the generalized input is indeed a generalization of the actual input. Furthermore, we generalize a very efficient inference attack on privacy-preserving sequence comparison protocols and use it to evaluate our inference-control mechanism. c) The third part of the framework lightens the computational load of the client taking part in the comparison protocol by presenting a compression mechanism for partially homomorphic cryptographic schemes. It reduces the transmission and storage overhead induced by the semantically secure homomorphic encryption schemes, as well as encryption latency. The compression is achieved by constructing an asymmetric stream cipher such that the generated ciphertext can be converted into a ciphertext of an associated homomorphic encryption scheme without revealing any information about the plaintext. This is the first compression scheme available for partially homomorphic encryption schemes. Compression of ciphertexts of fully homomorphic encryption schemes are several orders of magnitude slower at the conversion from the transmission ciphertext to the homomorphically encrypted ciphertext. Indeed our compression scheme achieves optimal conversion performance. It further allows to generate keystreams offline and thus supports offloading to trusted devices. This way transmission-, storage- and power-efficiency is improved. We give security proofs for all relevant parts of the proposed protocols and algorithms to evaluate their security. A performance evaluation of the core components demonstrates the practicability of our proposed solutions including a theoretical analysis and practical experiments to show the accuracy as well as efficiency of approximations and probabilistic algorithms. Several variations and configurations to detect similar inputs are studied during an in-depth discussion of the inference-control mechanism. A human mitochondrial genome database is used for the practical evaluation to compare genomic sequences and detect similar inputs as described by the use-case. In summary we show that it is indeed possible to construct an efficient and privacy-preserving (genomic) sequences comparison, while being able to control the amount of information that leaves the comparison. To the best of our knowledge we also contribute to the field by proposing the first efficient privacy-preserving inference detection and control mechanism, as well as the first ciphertext compression system for partially homomorphic cryptographic systems

    Blockchain-based recommender systems: Applications, challenges and future opportunities

    Get PDF
    Recommender systems have been widely used in different application domains including energy-preservation, e-commerce, healthcare, social media, etc. Such applications require the analysis and mining of massive amounts of various types of user data, including demographics, preferences, social interactions, etc. in order to develop accurate and precise recommender systems. Such datasets often include sensitive information, yet most recommender systems are focusing on the models' accuracy and ignore issues related to security and the users' privacy. Despite the efforts to overcome these problems using different risk reduction techniques, none of them has been completely successful in ensuring cryptographic security and protection of the users' private information. To bridge this gap, the blockchain technology is presented as a promising strategy to promote security and privacy preservation in recommender systems, not only because of its security and privacy salient features, but also due to its resilience, adaptability, fault tolerance and trust characteristics. This paper presents a holistic review of blockchain-based recommender systems covering challenges, open issues and solutions. Accordingly, a well-designed taxonomy is introduced to describe the security and privacy challenges, overview existing frameworks and discuss their applications and benefits when using blockchain before indicating opportunities for future research. 2021 Elsevier Inc.This paper was made possible by National Priorities Research Program (NPRP) grant No. 10-0130-170288 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Scopu

    Deep Neural Network Architectures for Large-scale, Robust and Small-Footprint Speaker and Language Recognition

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura : 27-04-2017Artificial neural networks are powerful learners of the information embedded in speech signals. They can provide compact, multi-level, nonlinear representations of temporal sequences and holistic optimization algorithms capable of surpassing former leading paradigms. Artificial neural networks are, therefore, a promising technology that can be used to enhance our ability to recognize speakers and languages–an ability increasingly in demand in the context of new, voice-enabled interfaces used today by millions of users. The aim of this thesis is to advance the state-of-the-art of language and speaker recognition through the formulation, implementation and empirical analysis of novel approaches for large-scale and portable speech interfaces. Its major contributions are: (1) novel, compact network architectures for language and speaker recognition, including a variety of network topologies based on fully-connected, recurrent, convolutional, and locally connected layers; (2) a bottleneck combination strategy for classical and neural network approaches for long speech sequences; (3) the architectural design of the first, public, multilingual, large vocabulary continuous speech recognition system; and (4) a novel, end-to-end optimization algorithm for text-dependent speaker recognition that is applicable to a range of verification tasks. Experimental results have demonstrated that artificial neural networks can substantially reduce the number of model parameters and surpass the performance of previous approaches to language and speaker recognition, particularly in the cases of long short-term memory recurrent networks (used to model the input speech signal), end-to-end optimization algorithms (used to predict languages or speakers), short testing utterances, and large training data collections.Las redes neuronales artificiales son sistemas de aprendizaje capaces de extraer la información embebida en las señales de voz. Son capaces de modelar de forma eficiente secuencias temporales complejas, con información no lineal y distribuida en distintos niveles semanticos, mediante el uso de algoritmos de optimización integral con la capacidad potencial de mejorar los sistemas aprendizaje automático existentes. Las redes neuronales artificiales son, pues, una tecnología prometedora para mejorar el reconocimiento automático de locutores e idiomas; siendo el reconocimiento de de locutores e idiomas, tareas con cada vez más demanda en los nuevos sistemas de control por voz, que ya utilizan millones de personas. Esta tesis tiene como objetivo la mejora del estado del arte de las tecnologías de reconocimiento de locutor y de idioma mediante la formulación, implementación y análisis empírico de nuevos enfoques basados en redes neuronales, aplicables a dispositivos portátiles y a su uso en gran escala. Las principales contribuciones de esta tesis incluyen la propuesta original de: (1) arquitecturas eficientes que hacen uso de capas neuronales densas, localmente densas, recurrentes y convolucionales; (2) una nueva estrategia de combinación de enfoques clásicos y enfoques basados en el uso de las denominadas redes de cuello de botella; (3) el diseño del primer sistema público de reconocimiento de voz, de vocabulario abierto y continuo, que es además multilingüe; y (4) la propuesta de un nuevo algoritmo de optimización integral para tareas de reconocimiento de locutor, aplicable también a otras tareas de verificación. Los resultados experimentales extraídos de esta tesis han demostrado que las redes neuronales artificiales son capaces de reducir el número de parámetros usados por los algoritmos de reconocimiento tradicionales, así como de mejorar el rendimiento de dichos sistemas de forma substancial. Dicha mejora relativa puede acentuarse a través del modelado de voz mediante redes recurrentes de memoria a largo plazo, el uso de algoritmos de optimización integral, el uso de locuciones de evaluation de corta duración y mediante la optimización del sistema con grandes cantidades de datos de entrenamiento

    Unsupervised Structural Embedding Methods for Efficient Collective Network Mining

    Full text link
    How can we align accounts of the same user across social networks? Can we identify the professional role of an email user from their patterns of communication? Can we predict the medical effects of chemical compounds from their atomic network structure? Many problems in graph data mining, including all of the above, are defined on multiple networks. The central element to all of these problems is cross-network comparison, whether at the level of individual nodes or entities in the network or at the level of entire networks themselves. To perform this comparison meaningfully, we must describe the entities in each network expressively in terms of patterns that generalize across the networks. Moreover, because the networks in question are often very large, our techniques must be computationally efficient. In this thesis, we propose scalable unsupervised methods that embed nodes in vector space by mapping nodes with similar structural roles in their respective networks, even if they come from different networks, to similar parts of the embedding space. We perform network alignment by matching nodes across two or more networks based on the similarity of their embeddings, and refine this process by reinforcing the consistency of each node’s alignment with those of its neighbors. By characterizing the distribution of node embeddings in a graph, we develop graph-level feature vectors that are highly effective for graph classification. With principled sparsification and randomized approximation techniques, we make all our methods computationally efficient and able to scale to graphs with millions of nodes or edges. We demonstrate the effectiveness of structural node embeddings on industry-scale applications, and propose an extensive set of embedding evaluation techniques that lay the groundwork for further methodological development and application.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162895/1/mheimann_1.pd

    Using Web Archives to Enrich the Live Web Experience Through Storytelling

    Get PDF
    Much of our cultural discourse occurs primarily on the Web. Thus, Web preservation is a fundamental precondition for multiple disciplines. Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, storytelling is becoming a popular technique in social media for selecting Web resources to support a particular narrative or story . In this dissertation, we address the problem of understanding the archived collections through proposing the Dark and Stormy Archive (DSA) framework, in which we integrate storytelling social media and Web archives. In the DSA framework, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify. To inform our work of generating stories from archived collections, we start by building a baseline for the structural characteristics of popular (i.e., receiving the most views) human-generated stories through investigating stories from Storify. Furthermore, we checked the entire population of Archive-It collections for better understanding the characteristics of the collections we intend to summarize. We then filter off-topic pages from the collections the using different methods to detect when an archived page in a collection has gone off-topic. We created a gold standard dataset from three Archive-It collections to evaluate the proposed methods at different thresholds. From the gold standard dataset, we identified five behaviors for the TimeMaps (a list of archived copies of a page) based on the page’s aboutness. Based on a dynamic slicing algorithm, we divide the collection and cluster the pages in each slice. We then select the best representative page from each cluster based on different quality metrics (e.g., the replay quality, and the quality of the generated snippet from the page). At the end, we put the selected pages in chronological order and visualize them using Storify. For evaluating the DSA framework, we obtained a ground truth dataset of hand-crafted stories from Archive-It collections generated by expert archivists. We used Amazon’s Mechanical Turk to evaluate the automatically generated stories against the stories that were created by domain experts. The results show that the automatically generated stories by the DSA are indistinguishable from those created by human subject domain experts, while at the same time both kinds of stories (automatic and human) are easily distinguished from randomly generated storie

    The Routledge Handbook of Museums, Media and Communication

    Get PDF
    Museums today find themselves within a mediatised society, where everyday life is conducted in a data-full and technology-rich context. In fact, museums are themselves mediatised: they present a uniquely media-centred environment, in which communicative media is a constitutive property of their organisation and of the visitor experience. The Routledge Handbook of Museums, Media and Communication explores what it means to take mediated communication as a key concept for museum studies and as a sensitising lens for media-related museum practice on the ground. Including contributions from experts around the world, this original and innovative Handbook shares a nuanced and precise understanding of media, media concepts and media terminology, rehearsing new locations for writing on museum media and giving voice to new subject alignments. As a whole, the volume breaks new ground by reframing mediated museum communication as a resource for an inclusive understanding of current museum developments. The Routledge Handbook of Museums, Media and Communication will appeal to both students and scholars, as well as to practitioners involved in the visioning, design and delivery of mediated communication in the museum. It teaches us not just how to study museums, but how to go about being a museum in today’s world
    corecore