3,372 research outputs found

    A multi-collection latent topic model for federated search

    Get PDF
    Collection selection is a crucial function, central to the effectiveness and efficiency of a federated information retrieval system. A variety of solutions have been proposed for collection selection adapting proven techniques used in centralised retrieval. This paper defines a new approach to collection selection that models the topical distribution in each collection. We describe an extended version of latent Dirichletallocation that uses a hierarchical hyperprior to enable the different topical distributions found in each collection to be modelled. Under the model, resources are ranked based on the topical relationship between query and collection. By modelling collections in a low dimensional topic space, we can implicitly smooth their term-based characterisation with appropriate terms from topically related samples, thereby dealing with the problem of missing vocabulary within the samples. An important advantage of adopting this hierarchical model over current approaches is that the model generalises well to unseen documents given small samples of each collection. The latent structure of each collection can therefore be estimated well despite imperfect information for each collection such as sampled documents obtained through query-based sampling. Experiments demonstrate that this new, fully integrated topical model is more robust than current state of the art collection selection algorithm

    Target Apps Selection: Towards a Unified Search Framework for Mobile Devices

    Full text link
    With the recent growth of conversational systems and intelligent assistants such as Apple Siri and Google Assistant, mobile devices are becoming even more pervasive in our lives. As a consequence, users are getting engaged with the mobile apps and frequently search for an information need in their apps. However, users cannot search within their apps through their intelligent assistants. This requires a unified mobile search framework that identifies the target app(s) for the user's query, submits the query to the app(s), and presents the results to the user. In this paper, we take the first step forward towards developing unified mobile search. In more detail, we introduce and study the task of target apps selection, which has various potential real-world applications. To this aim, we analyze attributes of search queries as well as user behaviors, while searching with different mobile apps. The analyses are done based on thousands of queries that we collected through crowdsourcing. We finally study the performance of state-of-the-art retrieval models for this task and propose two simple yet effective neural models that significantly outperform the baselines. Our neural approaches are based on learning high-dimensional representations for mobile apps. Our analyses and experiments suggest specific future directions in this research area.Comment: To appear at SIGIR 201

    Technological, organisational, and environmental factors affecting the adoption of blockchain-based distributed identity management in organisations

    Get PDF
    Background: Blockchain is a disruptive technology with the potential to innovate businesses. Ignoring or resisting it might result in a competitive disadvantage for organisations. Apart from its original financial application of cryptocurrency, other applications are emerging, the most common being supply chain management and e-voting systems. However, there is less focus on information and cybersecurity applications, especially from the enterprise perspective. This research addresses this knowledge gap, focussing on its application of distributed identity management in organisations. Objectives: The main objective is to investigate technological, organisational, and environmental (TOE) factors affecting the adoption of blockchain-based distributed identity management (BDIDM) in organisations to determine the most critical factors. Secondary objectives include determining whether the blockchain type affects BDIDM adoption and whether the TOE-BDIDM model measuring the phenomenon is effective and appropriate. But given the relative newness of blockchain, the initial goal consists of intensively exploring the topic to understand the practicality of adopting BDIDM in organisations and establishing whether claims made around it are factual than just due to the blockchain hype. Methodology: The study uses meta-synthesis to explore the topic, summarising 69 papers selected qualitatively from reputed academic sources. The study then surveys 111 information and cybersecurity practitioners selected randomly in South African organisations to investigate the TOE factors affecting BDIDM adoption. To do so, it utilises an online questionnaire rooted in an adapted TOE model called TOE-BDIDM as a data collection instrument. The analysis of this primary data is purely quantitative and includes (i) Structural Equation Modelling (SEM) of the measurement model, i.e. confirmatory factor analysis (CFA); (ii) binary logistics regression analysis; and (iii) Chi-Square tests Results: Meta-synthesis revealed theoretical grounds underlying claims made around the topic while spotting diverging views about BDIDM practicality for the enterprise context. It also identifies the TOE theory as more suitable to explain the phenomenon. Binary logistics regression modelling reveals that TOE factors do affect BDIDM adoption in organisations, either positively or negatively. The factors predict BDIDM adopters and non-adopters, with Technology Characteristics being the most critical factor and the most that could predict BDIDM non-adopters. Organisation Readiness was the second critical factor, the most that could predict BDIDM adopters. Overall, TOE-BDIDM effectively predicted 92.5% of adopters and 45.2% of non-adopters. CFA indicates that TOE-BDIDM appropriateness for investigating the phenomenon is relatively fair. The Chi-Square tests reveal a significant association between Blockchain Type and BDIDM adoption. Implications: The discussion highlights various implications of the above findings, including the plausibility of the impartiality of typical privacy-preserving BDIDM models like the Selfsovereign identity: The majority of respondents preferred private permissioned blockchain, which tends to be centralised, more intermediated, and less privacy-preserving. The rest implications relate to the disruptiveness nature of BDIDM and the BDIDM adoption being more driven by technological than organisational or environmental factors. The study ends by reflecting on the research process and providing fundamental limitations and recommendations for future researc

    Machine learning model selection with multi-objective Bayesian optimization and reinforcement learning

    Get PDF
    A machine learning system, including when used in reinforcement learning, is usually fed with only limited data, while aimed at training a model with good predictive performance that can generalize to an underlying data distribution. Within certain hypothesis classes, model selection chooses a model based on selection criteria calculated from available data, which usually serve as estimators of generalization performance of the model. One major challenge for model selection that has drawn increasing attention is the discrepancy between the data distribution where training data is sampled from and the data distribution at deployment. The model can over-fit in the training distribution, and fail to extrapolate in unseen deployment distributions, which can greatly harm the reliability of a machine learning system. Such a distribution shift challenge can become even more pronounced in high-dimensional data types like gene expression data, functional data and image data, especially in a decentralized learning scenario. Another challenge for model selection is efficient search in the hypothesis space. Since training a machine learning model usually takes a fair amount of resources, searching for an appropriate model with favorable configurations is by inheritance an expensive process, thus calling for efficient optimization algorithms. To tackle the challenge of distribution shift, novel resampling methods for the evaluation of robustness of neural network was proposed, as well as a domain generalization method using multi-objective bayesian optimization in decentralized learning scenario and variational inference in a domain unsupervised manner. To tackle the expensive model search problem, combining bayesian optimization and reinforcement learning in an interleaved manner was proposed for efficient search in a hierarchical conditional configuration space. Additionally, the effectiveness of using multi-objective bayesian optimization for model search in a decentralized learning scenarios was proposed and verified. A model selection perspective to reinforcement learning was proposed with associated contributions in tackling the problem of exploration in high dimensional state action spaces and sparse reward. Connections between statistical inference and control was summarized. Additionally, contributions in open source software development in related machine learning sub-topics like feature selection and functional data analysis with advanced tuning method and abundant benchmarking were also made

    Hypermedia-based discovery for source selection using low-cost linked data interfaces

    Get PDF
    Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong impact on selecting sources that contribute to the query results. Therefore, the authors introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, the authors identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for their concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness

    A Survey of Graph-based Deep Learning for Anomaly Detection in Distributed Systems

    Full text link
    Anomaly detection is a crucial task in complex distributed systems. A thorough understanding of the requirements and challenges of anomaly detection is pivotal to the security of such systems, especially for real-world deployment. While there are many works and application domains that deal with this problem, few have attempted to provide an in-depth look at such systems. In this survey, we explore the potentials of graph-based algorithms to identify anomalies in distributed systems. These systems can be heterogeneous or homogeneous, which can result in distinct requirements. One of our objectives is to provide an in-depth look at graph-based approaches to conceptually analyze their capability to handle real-world challenges such as heterogeneity and dynamic structure. This study gives an overview of the State-of-the-Art (SotA) research articles in the field and compare and contrast their characteristics. To facilitate a more comprehensive understanding, we present three systems with varying abstractions as use cases. We examine the specific challenges involved in anomaly detection within such systems. Subsequently, we elucidate the efficacy of graphs in such systems and explicate their advantages. We then delve into the SotA methods and highlight their strength and weaknesses, pointing out the areas for possible improvements and future works.Comment: The first two authors (A. Danesh Pazho and G. Alinezhad Noghre) have equal contribution. The article is accepted by IEEE Transactions on Knowledge and Data Engineerin
    • …
    corecore