Search CORE

1,101 research outputs found

Scalable aggregation predictive analytics: a query-driven machine learning approach

Author: Anagnostopoulos Christos
Savva Fotis
Triantafillou Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2018
Field of study

We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

Warwick Research Archives Portal Repository

Enlighten

Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization

Author: Dai Yuchao
Li Hongdong
Liu Liu
Publication venue
Publication date: 06/08/2019
Field of study

This paper tackles the problem of large-scale image-based localization (IBL) where the spatial location of a query image is determined by finding out the most similar reference images in a large database. For solving this problem, a critical task is to learn discriminative image representation that captures informative information relevant for localization. We propose a novel representation learning method having higher location-discriminating power. It provides the following contributions: 1) we represent a place (location) as a set of exemplar images depicting the same landmarks and aim to maximize similarities among intra-place images while minimizing similarities among inter-place images; 2) we model a similarity measure as a probability distribution on L_2-metric distances between intra-place and inter-place image representations; 3) we propose a new Stochastic Attraction and Repulsion Embedding (SARE) loss function minimizing the KL divergence between the learned and the actual probability distributions; 4) we give theoretical comparisons between SARE, triplet ranking and contrastive losses. It provides insights into why SARE is better by analyzing gradients. Our SARE loss is easy to implement and pluggable to any CNN. Experiments show that our proposed method improves the localization performance on standard benchmarks by a large margin. Demonstrating the broad applicability of our method, we obtained the third place out of 209 teams in the 2018 Google Landmark Retrieval Challenge. Our code and model are available at https://github.com/Liumouliu/deepIBL.Comment: ICC

arXiv.org e-Print Archive

Crossref

On Efficiency of Selected Machine Learning Algorithms for Intrusion Detection in Software Defined Networks

Author: Amanowicz Marek
Jankowski Damian
Publication venue: Electronics and Telecommunications Committee
Publication date: 01/01/2016
Field of study

We propose a concept of using Software Defined Network (SDN) technology and machine learning algorithms for monitoring and detection of malicious activities in the SDN data plane. The statistics and features of network traffic are generated by the native mechanisms of SDN technology. In order to conduct tests and a verification of the concept, it was necessary to obtain a set of network workload test data. We present virtual environment which enables generation of the SDN network traffic. The article examines the efficiency of selected machine learning methods: Self Organizing Maps and Learning Vector Quantization and their enhanced versions. The results are compared with other SDN-based IDS

Biblioteka Nauki - repozytorium artykuÅÃ³w

International Journal of Electronics and Telecommunications (Warsaw University of Technology)

Machine Learning for Fluid Mechanics

Author: Brunton Steven
Koumoutsakos Petros
Noack Bernd
Publication venue: 'Annual Reviews'
Publication date: 04/01/2020
Field of study

The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from field measurements, experiments and large-scale simulations at multiple spatiotemporal scales. Machine learning offers a wealth of techniques to extract information from data that could be translated into knowledge about the underlying fluid mechanics. Moreover, machine learning algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of machine learning for fluid mechanics. It outlines fundamental machine learning methodologies and discusses their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experimentation, and simulation. Machine learning provides a powerful information processing framework that can enrich, and possibly even transform, current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202

arXiv.org e-Print Archive

Deep hashing with self-supervised asymmetric semantic excavation and margin-scalable constraint

Author: Bakker E.M.
Dou Z.
Wu S.
Yu Z.
Publication venue: 'Elsevier BV'
Publication date: 28/04/2022
Field of study

Due to its effectivity and efficiency, deep hashing approaches are widely used for large-scale visual search. However, it is still challenging to produce compact and discriminative hash codes for images asso-ciated with multiple semantics for two main reasons, 1) similarity constraints designed in most of the existing methods are based upon an oversimplified similarity assignment (i.e., 0 for instance pairs sharing no label, 1 for instance pairs sharing at least 1 label), 2) the exploration in multi-semantic relevance are insufficient or even neglected in many of the existing methods. These problems significantly limit the dis-crimination of generated hash codes. In this paper, we propose a novel Deep Hashing with Self-Supervised Asymmetric Semantic Excavation and Margin-Scalable Constraint(SADH) approach to cope with these problems. SADH implements a self-supervised network to sufficiently preserve semantic information in a semantic feature dictionary and a semantic code dictionary for the semantics of the given dataset, which efficiently and precisely guides a feature learning network to preserve multi-label semantic information using an asymmetric learning strategy. By further exploiting semantic dictionaries, a new margin-scalable constraint is employed for both precise similarity searching and robust hash code generation. Extensive empirical research on four popular benchmarks validates the proposed method and shows it outperforms several state-of-the-art approaches. The source codes URL of our SADH is: http:// github.com/SWU-CS-MediaLab/SADH. (c) 2022 Elsevier B.V. All rights reserved.Computer Systems, Imagery and Medi

Leiden University Scholary Publications

Machine learning with limited label availability: algorithms and applications

Author: GIOBERGIA FLAVIO
Publication venue: country:Italy
Publication date: 15/02/2023
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)