3,372 research outputs found
A multi-collection latent topic model for federated search
Collection selection is a crucial function, central to the effectiveness and efficiency of a federated information retrieval system. A variety of solutions have been proposed for collection selection adapting proven techniques used in centralised retrieval. This paper defines a new approach to collection selection that models the topical distribution in each collection. We describe an extended version of latent Dirichletallocation that uses a hierarchical hyperprior to enable the different topical distributions found in each collection to be modelled. Under the model, resources are ranked based on the topical relationship between query and collection. By modelling collections in a low dimensional topic space, we can implicitly smooth their term-based characterisation with appropriate terms from topically related samples, thereby dealing with the problem of missing vocabulary within the samples. An important advantage of adopting this hierarchical model over current approaches is that the model generalises well to unseen documents given small samples of each collection. The latent structure of each collection can therefore be estimated well despite imperfect information for each collection such as sampled documents obtained through query-based sampling. Experiments demonstrate that this new, fully integrated topical model is more robust than current state of the art collection selection algorithm
Target Apps Selection: Towards a Unified Search Framework for Mobile Devices
With the recent growth of conversational systems and intelligent assistants
such as Apple Siri and Google Assistant, mobile devices are becoming even more
pervasive in our lives. As a consequence, users are getting engaged with the
mobile apps and frequently search for an information need in their apps.
However, users cannot search within their apps through their intelligent
assistants. This requires a unified mobile search framework that identifies the
target app(s) for the user's query, submits the query to the app(s), and
presents the results to the user. In this paper, we take the first step forward
towards developing unified mobile search. In more detail, we introduce and
study the task of target apps selection, which has various potential real-world
applications. To this aim, we analyze attributes of search queries as well as
user behaviors, while searching with different mobile apps. The analyses are
done based on thousands of queries that we collected through crowdsourcing. We
finally study the performance of state-of-the-art retrieval models for this
task and propose two simple yet effective neural models that significantly
outperform the baselines. Our neural approaches are based on learning
high-dimensional representations for mobile apps. Our analyses and experiments
suggest specific future directions in this research area.Comment: To appear at SIGIR 201
Technological, organisational, and environmental factors affecting the adoption of blockchain-based distributed identity management in organisations
Background: Blockchain is a disruptive technology with the potential to innovate businesses. Ignoring or resisting it might result in a competitive disadvantage for organisations. Apart from its original financial application of cryptocurrency, other applications are emerging, the most common being supply chain management and e-voting systems. However, there is less focus on information and cybersecurity applications, especially from the enterprise perspective. This research addresses this knowledge gap, focussing on its application of distributed identity management in organisations. Objectives: The main objective is to investigate technological, organisational, and environmental (TOE) factors affecting the adoption of blockchain-based distributed identity management (BDIDM) in organisations to determine the most critical factors. Secondary objectives include determining whether the blockchain type affects BDIDM adoption and whether the TOE-BDIDM model measuring the phenomenon is effective and appropriate. But given the relative newness of blockchain, the initial goal consists of intensively exploring the topic to understand the practicality of adopting BDIDM in organisations and establishing whether claims made around it are factual than just due to the blockchain hype. Methodology: The study uses meta-synthesis to explore the topic, summarising 69 papers selected qualitatively from reputed academic sources. The study then surveys 111 information and cybersecurity practitioners selected randomly in South African organisations to investigate the TOE factors affecting BDIDM adoption. To do so, it utilises an online questionnaire rooted in an adapted TOE model called TOE-BDIDM as a data collection instrument. The analysis of this primary data is purely quantitative and includes (i) Structural Equation Modelling (SEM) of the measurement model, i.e. confirmatory factor analysis (CFA); (ii) binary logistics regression analysis; and (iii) Chi-Square tests Results: Meta-synthesis revealed theoretical grounds underlying claims made around the topic while spotting diverging views about BDIDM practicality for the enterprise context. It also identifies the TOE theory as more suitable to explain the phenomenon. Binary logistics regression modelling reveals that TOE factors do affect BDIDM adoption in organisations, either positively or negatively. The factors predict BDIDM adopters and non-adopters, with Technology Characteristics being the most critical factor and the most that could predict BDIDM non-adopters. Organisation Readiness was the second critical factor, the most that could predict BDIDM adopters. Overall, TOE-BDIDM effectively predicted 92.5% of adopters and 45.2% of non-adopters. CFA indicates that TOE-BDIDM appropriateness for investigating the phenomenon is relatively fair. The Chi-Square tests reveal a significant association between Blockchain Type and BDIDM adoption. Implications: The discussion highlights various implications of the above findings, including the plausibility of the impartiality of typical privacy-preserving BDIDM models like the Selfsovereign identity: The majority of respondents preferred private permissioned blockchain, which tends to be centralised, more intermediated, and less privacy-preserving. The rest implications relate to the disruptiveness nature of BDIDM and the BDIDM adoption being more driven by technological than organisational or environmental factors. The study ends by reflecting on the research process and providing fundamental limitations and recommendations for future researc
Machine learning model selection with multi-objective Bayesian optimization and reinforcement learning
A machine learning system, including when used in reinforcement learning, is usually fed with only limited data, while aimed at training a model with good predictive performance that can generalize to an underlying data distribution. Within certain hypothesis classes, model selection chooses a model based on selection criteria calculated from available data, which usually serve as estimators of generalization performance of the model.
One major challenge for model selection that has drawn increasing attention is the discrepancy between the data distribution where training data is sampled from and the data distribution at deployment. The model can over-fit in the training distribution, and fail to extrapolate in unseen deployment distributions, which can greatly harm the reliability of a machine learning system. Such a distribution shift challenge can become even more pronounced in high-dimensional data types like gene expression data, functional data and image data, especially in a decentralized learning scenario. Another challenge for model selection is efficient search in the hypothesis space. Since training a machine learning model usually takes a fair amount of resources, searching for an appropriate model with favorable configurations is by inheritance an expensive process, thus calling for efficient optimization algorithms.
To tackle the challenge of distribution shift, novel resampling methods for the evaluation of robustness of neural network was proposed, as well as a domain generalization method using multi-objective bayesian optimization in decentralized learning scenario and variational inference in a domain unsupervised manner.
To tackle the expensive model search problem, combining bayesian optimization and reinforcement learning in an interleaved manner was proposed for efficient search in a hierarchical conditional configuration space. Additionally, the effectiveness of using multi-objective bayesian optimization for model search in a decentralized learning scenarios was proposed and verified.
A model selection perspective to reinforcement learning was proposed with associated contributions in tackling the problem of exploration in high dimensional state action spaces and sparse reward. Connections between statistical inference and control was summarized.
Additionally, contributions in open source software development in related machine learning sub-topics like feature selection and functional data analysis with advanced tuning method and abundant benchmarking were also made
Hypermedia-based discovery for source selection using low-cost linked data interfaces
Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong impact on selecting sources that contribute to the query results. Therefore, the authors introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, the authors identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for their concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness
A Survey of Graph-based Deep Learning for Anomaly Detection in Distributed Systems
Anomaly detection is a crucial task in complex distributed systems. A
thorough understanding of the requirements and challenges of anomaly detection
is pivotal to the security of such systems, especially for real-world
deployment. While there are many works and application domains that deal with
this problem, few have attempted to provide an in-depth look at such systems.
In this survey, we explore the potentials of graph-based algorithms to identify
anomalies in distributed systems. These systems can be heterogeneous or
homogeneous, which can result in distinct requirements. One of our objectives
is to provide an in-depth look at graph-based approaches to conceptually
analyze their capability to handle real-world challenges such as heterogeneity
and dynamic structure. This study gives an overview of the State-of-the-Art
(SotA) research articles in the field and compare and contrast their
characteristics. To facilitate a more comprehensive understanding, we present
three systems with varying abstractions as use cases. We examine the specific
challenges involved in anomaly detection within such systems. Subsequently, we
elucidate the efficacy of graphs in such systems and explicate their
advantages. We then delve into the SotA methods and highlight their strength
and weaknesses, pointing out the areas for possible improvements and future
works.Comment: The first two authors (A. Danesh Pazho and G. Alinezhad Noghre) have
equal contribution. The article is accepted by IEEE Transactions on Knowledge
and Data Engineerin
- …