135,790 research outputs found

    Mobile Phone Text Processing and Question-Answering

    Get PDF
    Mobile phone text messaging between mobile users and information services is a growing area of Information Systems. Users may require the service to provide an answer to queries, or may, in wikistyle, want to contribute to the service by texting in some information within the service’s domain of discourse. Given the volume of such messaging it is essential to do the processing through an automated service. Further, in the case of repeated use of the service, the quality of such a response has the potential to benefit from a dynamic user profile that the service can build up from previous texts of the same user. This project will investigate the potential for creating such intelligent mobile phone services and aims to produce a computational model to enable their efficient implementation. To make the project feasible, the scope of the automated service is considered to lie within a limited domain of, for example, information about entertainment within a specific town centre. The project will assume the existence of a model of objects within the domain of discourse, hence allowing the analysis of texts within the context of a user model and a domain model. Hence, the project will involve the subject areas of natural language processing, language engineering, machine learning, knowledge extraction, and ontological engineering

    Using tag-neighbors for query expansion in medical information retrieval

    Full text link
    In the context of medical document retrieval, users often under-specified queries lead to undesired search results that suffer from not containing the information they seek, inadequate domain knowledge matches and unreliable sources. To overcome the limitations of under-specified queries, we utilize tags to enhance information retrieval capabilities by expanding users' original queries with context-relevant information. We compute a set of significant tag neighbor candidates based on the neighbor frequency and weight, and utilize the most frequent and weighted neighbors to expand an entry query that has terms matching tags. The proposed approach is evaluated using MedWorm medical article collection and standard evaluation methods from the text retrieval conference (TREC). We compared the baseline of 0.353 for Mean Average Precision (MAP), reaching a MAP 0.491 (+39%) with the query expansion. In-depth analysis shows how this strategy is beneficial when compared with different ranks of the retrieval results. © 2011 IEEE

    Context-based understanding of food-related queries using a culinary knowledge model

    Get PDF
    Dietary practices are governed by a mix of ethnographic aspects, such as social, cultural and environmental factors. These aspects need to be taken into consideration during an analysis of food-related queries. Queries are usually ambiguous. It is essential to understand, analyse and refine the queries for better search and retrieval. The work is focused on identifying the explicit, implicit and hidden facets of a query, taking into consideration the context – culinary domain. This article proposes a technique for query understanding, analysis and refinement based on a domain specific knowledge model. Queries are conceptualised by mapping the query term to concepts defined in the model. This allows an understanding of the semantic point of view of a query and an ability to determine the meaning of its terms and their interrelatedness. The knowledge model acts as a backbone providing the context for query understanding, analysis and refinement and outperforms other models, such as Schema.org, BBC Food Ontology and Recipe Ontology

    Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required. To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems. In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries. The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms. In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms. Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies

    Enrichment of raw sensor data to enable high-level queries

    Get PDF
    Sensor networks are increasingly used across various application domains. Their usage has the advantage of automated, often continuous, monitoring of activities and events. Ubiquitous sensor networks detect location of people and objects and their movement. In our research, we employ a ubiquitous sensor network to track the movement of players in a tennis match. By doing so, our goal is to create a detailed analysis of how the match progressed, recording points scored, games and sets, and in doing so, greatly reduce the eort of coaches and players who are required to study matches afterwards. The sensor network is highly efficient as it eliminates the need for manual recording of the match. However, it generates raw data that is unusable by domain experts as it contains no frame of reference or context and cannot be analyzed or queried. In this work, we present the UbiQuSE system of data transformers which bridges the gap between raw sensor data and the high-level requirements of domain specialists such as the tennis coach

    ACMiner: Extraction and Analysis of Authorization Checks in Android's Middleware

    Get PDF
    Billions of users rely on the security of the Android platform to protect phones, tablets, and many different types of consumer electronics. While Android's permission model is well studied, the enforcement of the protection policy has received relatively little attention. Much of this enforcement is spread across system services, taking the form of hard-coded checks within their implementations. In this paper, we propose Authorization Check Miner (ACMiner), a framework for evaluating the correctness of Android's access control enforcement through consistency analysis of authorization checks. ACMiner combines program and text analysis techniques to generate a rich set of authorization checks, mines the corresponding protection policy for each service entry point, and uses association rule mining at a service granularity to identify inconsistencies that may correspond to vulnerabilities. We used ACMiner to study the AOSP version of Android 7.1.1 to identify 28 vulnerabilities relating to missing authorization checks. In doing so, we demonstrate ACMiner's ability to help domain experts process thousands of authorization checks scattered across millions of lines of code

    Expanding sensor networks to automate knowledge acquisition

    Get PDF
    The availability of accurate, low-cost sensors to scientists has resulted in widespread deployment in a variety of sporting and health environments. The sensor data output is often in a raw, proprietary or unstructured format. As a result, it is often difficult to query multiple sensors for complex properties or actions. In our research, we deploy a heterogeneous sensor network to detect the various biological and physiological properties in athletes during training activities. The goal for exercise physiologists is to quickly identify key intervals in exercise such as moments of stress or fatigue. This is not currently possible because of low level sensors and a lack of query language support. Thus, our motivation is to expand the sensor network with a contextual layer that enriches raw sensor data, so that it can be exploited by a high level query language. To achieve this, the domain expert specifies events in a tradiational event-condition-action format to deliver the required contextual enrichment

    Context Models For Web Search Personalization

    Full text link
    We present our solution to the Yandex Personalized Web Search Challenge. The aim of this challenge was to use the historical search logs to personalize top-N document rankings for a set of test users. We used over 100 features extracted from user- and query-depended contexts to train neural net and tree-based learning-to-rank and regression models. Our final submission, which was a blend of several different models, achieved an NDCG@10 of 0.80476 and placed 4'th amongst the 194 teams winning 3'rd prize

    A platform for discovering and sharing confidential ballistic crime data.

    Get PDF
    Criminal investigations generate large volumes of complex data that detectives have to analyse and understand. This data tends to be "siloed" within individual jurisdictions and re-using it in other investigations can be difficult. Investigations into trans-national crimes are hampered by the problem of discovering relevant data held by agencies in other countries and of sharing those data. Gun-crimes are one major type of incident that showcases this: guns are easily moved across borders and used in multiple crimes but finding that a weapon was used elsewhere in Europe is difficult. In this paper we report on the Odyssey Project, an EU-funded initiative to mine, manipulate and share data about weapons and crimes. The project demonstrates the automatic combining of data from disparate repositories for cross-correlation and automated analysis. The data arrive from different cultural/domains with multiple reference models using real-time data feeds and historical databases
    corecore