Search CORE

1,152 research outputs found

ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)

Author: Benjamini Y.
Cohen I.
Jeyakumar V.
Pedregosa F.
Seth A. K.
Shimizu S.
Tenenbaum J. B.
Wang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/03/2019
Field of study

We present ExplainIt!, a declarative, unsupervised root-cause analysis engine that uses time series monitoring data from large complex systems such as data centres. ExplainIt! empowers operators to succinctly specify a large number of causal hypotheses to search for causes of interesting events. ExplainIt! then ranks these hypotheses, reducing the number of causal dependencies from hundreds of thousands to a handful for human understanding. We show how a declarative language, such as SQL, can be effective in declaratively enumerating hypotheses that probe the structure of an unknown probabilistic graphical causal model of the underlying system. Our thesis is that databases are in a unique position to enable users to rapidly explore the possible causal mechanisms in data collected from diverse sources. We empirically demonstrate how ExplainIt! had helped us resolve over 30 performance issues in a commercial product since late 2014, of which we discuss a few cases in detail.Comment: SIGMOD Industry Track 201

arXiv.org e-Print Archive

Crossref

ROOMS:ROlap based Occupation Measurement System

Author: Kemps G.C.M.
Publication venue
Publication date: 31/08/2011
Field of study

Pure OAI Repository

Data Mining Techniques For Effective and Scalable Traffic Analysis

Author: Baldi Mario
Baralis E.
Risso Fulvio Giovanni Ottavio
Publication venue: IEEE
Publication date: 01/01/2005
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Nitrate Client Performance Improvement with Cache Implementation

Author: Holec Filip
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2013
Field of study

Cílem práce je návrh a implementace výkonnostních vylepšení modulu python-nitrate. Výkonnostní vylepšení jsou založeny na sesbíraných případech užití, které využívají velké množství dat. Za účelem měření dopadu změn v modulu byly implementovány výkonnostní testy. Testování ukázalo, že modul python-nitrate s integrací vylepšení je v některých případech až několikanásobně rychlejší, avšak ve dvou případech může nastat zpomalení. Závěr práce obsahuje diskusi ohledem pokračování prací.The goal of the thesis is to design and implement performance improvements in python-nitrate module. Performance improvements are based on gathered use cases, which use large amount of data and network bandwidth. Performance test suite was implemented in order to measure impact of changes in module. Testing proved, that python-nitrate module with integrated performance improvements is in certain cases several times faster, but also can be slower in two cases. Discussion regarding possible extensions is present in the conclusion.

Digital library of Brno University of Technology

National Repository of Grey Literature

Data Mining Techniques For Effective and Scalable Traffic Analysis

Author: BALDI MARIO
BARALIS E.
RISSO FULVIO GIOVANNI OTTAVIO
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

An Investigation of the Security Designs of a Structured Query Language (Sql) Database and its Middleware Application and their Secure Implementation Within Thinclient Environments

Author: Winner-Leoni Melissa D.
Publication venue: ePublications at Regis University
Publication date: 05/10/2008
Field of study

The Information Portability and Accountability Act (HIPAA) and The SarbanesOxley (SOX) regulations greatly influenced the health care industry regarding the means of securing financial and private data within information and technology. With the introduction of thinclient technologies into medical information systems (IS), data security and regulation compliancy becomes more problematic due to the exposure to the World Wide Web (WWW) and malicious activity. This author explores the best practices of the medical industry and information technology industry for securing electronic data within the thinclient environment at the three levels of architecture: the SQL database, its middleware application, and Web interface. Designing security within the SQL database is not good enough as breaches can occur through unintended consequences during data access within the middleware application design and data exchange design over computer networks. For example, a hospital\u27s medical records, which are routinely exchanged over computer networks, are subject to the audit control an encryption requirements mandated for data security. (Department of, 2008). While there is an overlapping of security techniques within each of the three layers of architectural security design, the use of 18 methodologies greatly enhances the ability to protect electronic information. Due to the variety of IS used within a medical facility, security conscientiousness, consistency of security design, excellent communication between designers, developers and system engineers, and the use of standardized security techniques within each of the three layers of architecture are required

ePublications at Regis University

Text books untuk mata kuliah pemrograman web

Author: Andy Anderson
Jamsa Kris
Konrad King
Publication venue: Mc Graw Hill
Publication date: 01/01/2002
Field of study

.HTML.And.Web.Design.Tips.And.Techniques.Jan.2002.ISBN.0072228253.pd

Universitas Ahmad Dahlan Repository

Predicting multiple domain queue waiting time via machine learning

Author: Cortez Paulo
Guimarães Pedro
Loureiro Carolina
Moreira Carlos
Pereira Pedro José
Pinho André
Publication venue: Springer, Cham
Publication date: 01/01/2023
Field of study

This paper describes an implementation of the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology for a demonstrative case of human queue waiting time prediction. We collaborated with a multiple domain (e.g., bank, pharmacies) ticket management service software development company, aiming to study a Machine Learning (ML) approach to estimate queue waiting time. A large multiple domain database was analyzed, which included millions of records related with two time periods (one year, for the modeling experiments; and two year, for a deployment simulation). The data was first preprocessed (including data cleaning and feature engineering tasks) and then modeled by exploring five state-of-the-art ML regression algorithms and four input attribute selections (including newly engineered features). Furthermore, the ML approaches were compared with the estimation method currently adopted by the analyzed company. The computational experiments assumed two main validation procedures, a standard cross-validation and a Rolling Window scheme. Overall, competitive and quality results were obtained by an Automated ML (AutoML) algorithm fed with newly engineered features. Indeed, the proposed AutoML model produces a small error (from 5 to 7 min), while requiring a reasonable computational effort. Finally, an eXplainable Artificial Intelligence (XAI) approach was applied to a trained AutoML model, demonstrating the extraction of useful explanatory knowledge for this domain.This work has been supported by FCT - Fundação para a Ciência e Tecnologia within the R &D Units Project Scope: UIDB/00319/2020 and the project “QOMPASS .: Solução de Gestão de Serviços de Atendimento multi-entidade, multi-serviço e multi-idioma” within the Project Scope NORTE-01-0247-FEDER-038462

Universidade do Minho: RepositoriUM

Code Positioning in LLVM

Author: Lehtonen Henrik
Sonesson Klas
Publication venue: Lunds universitet/Institutionen för datavetenskap
Publication date: 01/01/2017
Field of study

Given the increasing performance disparity between processor speeds and memory latency, making efficient use of cache memory is more important than ever to achieve good performance in memory-bound workloads. Many modern first-level caches store instructions separately from data, making code layout and code size an important factor in the cache behavior of a program. This work investigates two methods that attempt to improve code locality, namely procedure splitting and procedure positioning, previously investigated by Pettis and Hansen. They are implemented in the open-source compiler framework LLVM to evaluate their effect on the SPEC CPU2000 benchmark suite and a benchmark run of the PostgreSQL database system. We found that our implementation is highly situational, but can be beneficial, reducing execution time by up to 3% on suitable SPEC benchmarks and an increase of 3% in average transactions per second on PostgreSQL