9,590 research outputs found

    Toward Entity-Aware Search

    Get PDF
    As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Steps towards adaptive situation and context-aware access: a contribution to the extension of access control mechanisms within pervasive information systems

    Get PDF
    L'évolution des systèmes pervasives a ouvert de nouveaux horizons aux systèmes d'information classiques qui ont intégré des nouvelles technologies et des services qui assurent la transparence d'accès aux resources d'information à n'importe quand, n'importe où et n'importe comment. En même temps, cette évolution a relevé des nouveaux défis à la sécurité de données et à la modélisation du contrôle d'accès. Afin de confronter ces challenges, differents travaux de recherche se sont dirigés vers l'extension des modèles de contrôles d'accès (en particulier le modèle RBAC) afin de prendre en compte la sensibilité au contexte dans le processus de prise de décision. Mais la liaison d'une décision d'accès aux contraintes contextuelles dynamiques d'un utilisateur mobile va non seulement ajouter plus de complexité au processus de prise de décision mais pourra aussi augmenter les possibilités de refus d'accès. Sachant que l'accessibilité est un élément clé dans les systèmes pervasifs et prenant en compte l'importance d'assurer l'accéssibilité en situations du temps réel, nombreux travaux de recherche ont proposé d'appliquer des mécanismes flexibles de contrôle d'accès avec des solutions parfois extrêmes qui depassent les frontières de sécurité telle que l'option de "Bris-de-Glace". Dans cette thèse, nous introduisons une solution modérée qui se positionne entre la rigidité des modèles de contrôle d'accès et la flexibilité qui expose des risques appliquées pendant des situations du temps réel. Notre contribution comprend deux volets : au niveau de conception, nous proposons PS-RBAC - un modèle RBAC sensible au contexte et à la situation. Le modèle réalise des attributions des permissions adaptatives et de solution de rechange à base de prise de décision basée sur la similarité face à une situation importanteÀ la phase d'exécution, nous introduisons PSQRS - un système de réécriture des requêtes sensible au contexte et à la situation et qui confronte les refus d'accès en reformulant la requête XACML de l'utilisateur et en lui proposant une liste des resources alternatives similaires qu'il peut accéder. L'objectif est de fournir un niveau de sécurité adaptative qui répond aux besoins de l'utilisateur tout en prenant en compte son rôle, ses contraintes contextuelles (localisation, réseau, dispositif, etc.) et sa situation. Notre proposition a été validé dans trois domaines d'application qui sont riches des contextes pervasifs et des scénarii du temps réel: (i) les Équipes Mobiles Gériatriques, (ii) les systèmes avioniques et (iii) les systèmes de vidéo surveillance.The evolution of pervasive computing has opened new horizons to classical information systems by integrating new technologies and services that enable seamless access to information sources at anytime, anyhow and anywhere. Meanwhile this evolution has opened new threats to information security and new challenges to access control modeling. In order to meet these challenges, many research works went towards extending traditional access control models (especially the RBAC model) in order to add context awareness within the decision-making process. Meanwhile, tying access decisions to the dynamic contextual constraints of mobile users would not only add more complexity to decision-making but could also increase the possibilities of access denial. Knowing that accessibility is a key feature for pervasive systems and taking into account the importance of providing access within real-time situations, many research works have proposed applying flexible access control mechanisms with sometimes extreme solutions that depass security boundaries such as the Break-Glass option. In this thesis, we introduce a moderate solution that stands between the rigidity of access control models and the riskful flexibility applied during real-time situations. Our contribution is twofold: on the design phase, we propose PS-RBAC - a Pervasive Situation-aware RBAC model that realizes adaptive permission assignments and alternative-based decision-making based on similarity when facing an important situation. On the implementation phase, we introduce PSQRS - a Pervasive Situation-aware Query Rewriting System architecture that confronts access denials by reformulating the user's XACML access request and proposing to him a list of alternative similar solutions that he can access. The objective is to provide a level of adaptive security that would meet the user needs while taking into consideration his role, contextual constraints (location, network, device, etc.) and his situation. Our proposal has been validated in three application domains that are rich in pervasive contexts and real-time scenarios: (i) Mobile Geriatric Teams, (ii) Avionic Systems and (iii) Video Surveillance Systems

    SoK: Cryptographically Protected Database Search

    Full text link
    Protected database search systems cryptographically isolate the roles of reading from, writing to, and administering the database. This separation limits unnecessary administrator access and protects data in the case of system breaches. Since protected search was introduced in 2000, the area has grown rapidly; systems are offered by academia, start-ups, and established companies. However, there is no best protected search system or set of techniques. Design of such systems is a balancing act between security, functionality, performance, and usability. This challenge is made more difficult by ongoing database specialization, as some users will want the functionality of SQL, NoSQL, or NewSQL databases. This database evolution will continue, and the protected search community should be able to quickly provide functionality consistent with newly invented databases. At the same time, the community must accurately and clearly characterize the tradeoffs between different approaches. To address these challenges, we provide the following contributions: 1) An identification of the important primitive operations across database paradigms. We find there are a small number of base operations that can be used and combined to support a large number of database paradigms. 2) An evaluation of the current state of protected search systems in implementing these base operations. This evaluation describes the main approaches and tradeoffs for each base operation. Furthermore, it puts protected search in the context of unprotected search, identifying key gaps in functionality. 3) An analysis of attacks against protected search for different base queries. 4) A roadmap and tools for transforming a protected search system into a protected database, including an open-source performance evaluation platform and initial user opinions of protected search.Comment: 20 pages, to appear to IEEE Security and Privac

    Neural fuzzy repair : integrating fuzzy matches into neural machine translation

    Get PDF
    We present a simple yet powerful data augmentation method for boosting Neural Machine Translation (NMT) performance by leveraging information retrieved from a Translation Memory (TM). We propose and test two methods for augmenting NMT training data with fuzzy TM matches. Tests on the DGT-TM data set for two language pairs show consistent and substantial improvements over a range of baseline systems. The results suggest that this method is promising for any translation environment in which a sizeable TM is available and a certain amount of repetition across translations is to be expected, especially considering its ease of implementation
    corecore