10 research outputs found

    Transformation of WSDL files using ETL in the E-orientation domain

    Get PDF
    The E-orientations platforms become an essential space for students to base their choice on tangible elements, as they offer a macroscopic view of the different academic and professional fields. In this context the Center for Engineering Sciences and Applied Sciences “SISA” funded the MMS Orientation project to focus on student orientation in Morocco. With respect to the peculiarity and the different characteristics of the E-orientation platforms, a comparative study is essential. In this article we will proceed by a comparative study of a sample of platforms in order to highlight the major functionalities that we will model through descriptive files. The work is divided into two parts: The first part will be a comparison and description of existing platforms using descriptive language (WSDL), the second part will use ETL as a transformation technology in order to highlight generic files that will serve as a basis for work. the expected meta model

    LSTM deep learning method for network intrusion detection system

    Get PDF
    The security of the network has become a primary concern for organizations. Attackers use different means to disrupt services or steal information, these various attacks push to think of a new way to block them all in one manner. In addition, these intrusions can change and penetrate the devices of security. To solve these issues, we suggest, in this paper, a new idea for Network Intrusion Detection System (NIDS) based on Long Short-TermMemory (LSTM) to recognize menaces and to obtain a long-term memory on them, inorder to stop the new attacks that are like the existing ones, and at the sametime, to have a single mean to block intrusions. According to the results of the experiments of detections that we have carried out, the Accuracy reaches upto 99.98 % and 99.93 % for respectively the classification of two classes and several classes, Also the False Positive Rate (FPR) reaches up to only 0,068 % and 0,023 % for respectively the classification of two classes and several classes, which proves that the proposed model is very effective, it has a great ability to memorize and differentiate between normal traffic and attack traffic and its identification is more accurate than other Machine Learning classifiers

    Quality measures for ETL processes: from goals to implementation

    Get PDF
    Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft

    Easy designing steps of a local data warehouse for possible analytical data processing

    Get PDF
    Data warehouse (DW) are used in local or global level as per usages. Most of the DW was designed for online purposes targeting the multinational firms. Majority of local firms directly purchase such readymade DW applications for their usages. Customization, maintenance and enhancement are very costly for them. To provide fruitful e-services, the Government departments, academic Institutes, firms, Telemedicine firms etc. need a DW of themselves. Lack of electricity and internet facilities, especially in rural areas, does not motivate citizen to use the benefits of e-services. In this digital world, every local firm is interested in having their DW that may support strategic and decision making for the business. This study highlights the basic technical designing steps of a local DW. It gives several possible solutions that may arise during the design of the process of Extraction Transformation and Loading (ETL). It gives detail steps to develop the dimension table, fact table and loading data. Data analytics normally answers business questions and suggest future solutions

    What is the IQ of your data transformation system?

    Full text link
    Mapping and translating data across different representations is a crucial problem in information systems. Many formalisms and tools are currently used for this purpose, to the point that devel- opers typically face a difficult question: “what is the right tool for my translation task?” In this paper, we introduce several techniques that contribute to answer this question. Among these, a fairly gen- eral definition of a data transformation system, a new and very effi- cient similarity measure to evaluate the outputs produced by such a system, and a metric to estimate user efforts. Based on these tech- niques, we are able to compare a wide range of systems on many translation tasks, to gain interesting insights about their effective- ness, and, ultimately, about their “intelligence”

    Solução de inteligência empresarial na área da energia

    Get PDF
    Trabalho de projecto de mestrado, Engenharia Informática, Universidade de Lisboa, Faculdade de Ciências, 2016O presente relatório descreve o trabalho realizado ao longo de um estágio na empresa Novabase, entre setembro de 2015 e junho de 2016, no âmbito do projeto “Solução de Inteligência Empresarial na Área da Energia” cujos principais objetivos foram o desenvolvimento de um processo de Extraction-Transformation-Loading (ETL) ágil e eficiente e a concretização dos relatórios operacionais do Sistema de Gestão de Clientes (SGC), bem como de um conjunto de dashboards. Inicialmente, analisaram-se o processo de ETL e as bases de dados existentes no SGC. Uma vez que não havia qualquer modelo do sistema, concretizou-se um modelo relacional que apresenta os principais conceitos envolvidos bem como as relações entre os mesmos, permitindo obter uma visão abrangente e aprofundada dos dados. Depois, desenvolveu-se um processo de ETL composto por múltiplos jobs. Após a constatação do seu elevado tempo de execução, procedeu-se à sua otimização com base na estratégia de carregamento de dados em massa, tendo permitido a melhoria dos tempos de execução dos jobs. A avaliação deste processo passou pela análise da conformidade dos dados obtidos face aos dados do SGC e pela verificação dos seus tempos de execução. Seguidamente, implementou-se um conjunto de relatórios operacionais, os quais foram avaliados segundo os requisitos do cliente, as funcionalidades expectáveis e a interface com o utilizador. Por fim, surgiu ainda a hipótese de construir alguns dashboards de BI, no âmbito de duas provas de conceito. Estes dashboards foram avaliados tendo em conta a infor-mação que disponibilizam e a forma como esta se apresenta ao utilizador. O estágio permitiu aplicar e aprofundar os conhecimentos adquiridos ao longo do percurso académico bem como melhorar a capacidade de trabalho em equipa, gestão de tarefas e a compreensão do mundo empresarial.This report describes the work done during an internship at the company Novabase, between September 2015 and June 2016, under the project “Solução de Inteligência Empresarial na Área da Energia” whose main goals were the development of an agile and efficient Extraction-Transformation-Loading (ETL) process and the creation of operational reports for Sistema de Gestão de Clientes (SGC), as well as a set of dashboards. Initially, the ETL process and the existing databases were analyzed. Since there was no system data model, it was designed a relational model that presents the key concepts involved and the relationships between them, allowing a deep and comprehensive view of the data. Then, an ETL process consisting of multiple jobs was developed. After finding the high execution time required by the jobs, it proceeded to its optimization based on bulk data loading strategy that allowed an improvement in the jobs’ runtimes. The evaluation of this process includes the analysis of data compliance between the SGC data and the obtained data and the verification of its execution times. Next, a set of operational reports were implemented, which were evaluated according to customer requirements, the expected functionality and user interface. Finally, it was possible to build some BI dashboards, within two proves of concept. These dashboards were evaluated taking into account the information they provide and how this information is presented to the user This internship allowed to apply and deepen the knowledge acquired during the academic course and to improve the ability to work in teams, task management and understanding the business world

    Cost optimization for data placement strategies in an analytical cloud service

    Get PDF
    Analyzing a large amount of business-relevant data in near-realtime in order to assist decision making became a crucial requirement for many businesses in the last years. Therefore, all major database system vendors offer solutions that assist customers in this requirement with systems that are specially tuned for accelerating analytical workloads. Before the decision is made to buy such a huge and expensive solution, customers are interested in getting a detailed workload analysis in order to estimate potential benefits. Therefore, a more agile solution is desirable having lower barriers to entry that allows customers to assess analytical solutions for their workloads and lets data scientists experiment with available data on test systems before rolling out valuable analytical reports on a production system. In such a scenario where separate systems are deployed for handling transactional workloads of daily customers business and conducting business analytics on either a cloud service or a dedicated accelerator appliance, data management and placement strategies are of high importance. Multiple approaches exist for keeping the data set in-sync and guaranteeing data coherence with unique characteristics regarding important metrics that impact query performance, such as the latency when data will be propagated, achievable throughputs for larger data volumes, or the amount of required CPU to detect and deploy data changes. So the important heuristics are analyzed and evolved in order to develop a general model for data placement and maintenance strategies. Based on this theoretical model, a prototype is also implemented that predicts these metrics

    Preface

    Get PDF

    Personalisierung im E-Commerce – zur Wirkung von E-Mail-Personalisierung auf ausgewählte ökonomische Kennzahlen des Konsumentenverhaltens: Personalisierung im E-Commerce – zur Wirkung von E-Mail-Personalisierung auf ausgewählte ökonomische Kennzahlen des Konsumentenverhaltens

    Get PDF
    Personalisierung ist ein wichtiger Bereich des Internet Marketings, zu dem es wenige experimentelle Untersuchungen mit großen Teilnehmerzahlen gibt. Für den erfolgreichen Einsatz von Empfehlungsverfahren sind umfangreiche Daten über das Käuferverhalten erforderlich. Diesen Problemstellungen nimmt sich die vorliegende Arbeit an. In ihr wird das Shop-übergreifende individuelle Käuferverhalten von bis zu 126.000 Newsletter-Empfängern eines deutschen Online-Bonussystems sowohl mittels ausgewählter Data-Mining-Methoden als auch experimentell untersucht. Dafür werden Prototypen eines Data-Mining-Systems, einer A/B-Test-Software-Komponente und einer Empfehlungssystem-Komponente entwickelt und im Rahmen des Data Minings und durch Online-Feldexperimente evaluiert. Dabei kann für die genannte Nutzergruppe in einem Experiment bereits mit einem einfachen Empfehlungsverfahren gezeigt werden, dass zum einen die Shop-übergreifenden individuellen Verhaltensdaten des Online-Bonus-Systems für die Erzeugung von Empfehlungen geeignet sind, und zum anderen, dass die dadurch erzeugten Empfehlungen zu signifikant mehr Bestellungen als bei der besten Empfehlung auf Basis durchschnittlichen Käuferverhaltens führten. In weiteren Experimenten im Rahmen der Evaluierung der A/B-Test-Komponente konnte gezeigt werden, dass absolute Rabattangebote nur dann zu signifikant mehr Bestellungen führten als relative Rabatt-Angebote, wenn sie mit einer Handlungsaufforderung verbunden waren. Die Arbeit ordnet sich damit in die Forschung zur Beeinflussung des Käuferverhaltens durch Personalisierung und durch unterschiedliche Rabatt-Darstellungen ein und trägt die genannten Ergebnisse und Artefakte bei.:1 Inhalt 1 Einleitung 1 1.1 Stand der Forschung 3 1.2 Forschungsbedarf 6 1.3 Forschungskonzept 8 1.4 Verwendete Methoden 11 1.5 Aufbau der Arbeit 11 2 Theoretische und konzeptionelle Grundlagen 13 2.1 Internethandel, E-Commerce und E-Business 13 2.2 Marketing, Konsumenten- und Käuferverhalten 16 2.2.1 Käuferverhalten bei Rabatt-Angeboten 20 2.3 Internet Marketing 21 2.3.1 Erfolgskontrolle im Internet Marketing 24 2.3.2 Ausgewählte Disziplinen des Internet Marketings 27 2.3.2.1 Affiliate Marketing 28 2.3.2.2 Online-Cashback-Systeme 35 2.3.2.3 E-Mail-Marketing 38 2.4 Personalisierung im Internet Marketing 56 2.4.1 Empfehlungssysteme 59 2.4.2 Bewertung von Empfehlungssystemen 59 2.4.3 Architektur von Empfehlungssystemen 60 2.4.4 Empfehlungssystem-Kategorien 62 2.4.4.1 Hybride Empfehlungssysteme 67 2.4.5 Techniken für Empfehlungsverfahren 69 2.5 Wissensaufbereitung und -entdeckung 89 2.5.1 Datenerhebungsverfahren 89 2.5.1.1 Datenqualität 91 2.5.1.2 Datensicherheit und Datenschutz 92 2.5.2 Knowledge Discovery und Data Mining 94 2.5.2.1 Der Data-Mining-Prozess 96 2.5.2.2 Data-Mining-Problemtypen 98 2.5.2.3 Das Data-Mining-System 100 2.5.3 Das Experiment als Erhebungsdesign 106 2.5.3.1 Anforderungen und Gütekriterien 111 2.5.3.2 Online-Feldexperimente im Marketing 117 2.5.3.3 Auswertungsverfahren 120 2.5.3.4 Theoretische Grundlagen des A/B-Testverfahrens 121 3 Vorgehen 126 3.1 Forschungsdesign 126 3.1.1.1 Ziele und Anforderungen der Andasa GmbH 128 3.1.1.2 Ziele und Anforderungen des Instituts für Angewandte Informatik 129 3.1.2 Design des Informationssystems 130 3.1.2.1 Der Designprozess 131 3.1.3 Konzeption des Software-Systems 133 3.1.4 Evaluation 134 3.2 Datenanalyse 135 3.2.1 Datenbeschaffung 135 3.2.2 Datenaufbereitung 136 3.2.3 Auswahl geeigneter Data-Mining-Methoden 137 3.2.3.1 Auswahl-Kriterien 137 3.2.3.2 Methodenauswahl 140 3.2.4 Erläuterung ausgewählter Data-Mining-Methoden 156 3.2.4.1 Bayes’sche Netze 156 3.2.4.2 Clustering 158 3.2.4.3 Diskriminanzanalyse 158 3.2.4.4 Korrelationsanalyse 159 3.2.4.5 Online Analytical Processing (OLAP) 159 3.2.5 Auswahl geeigneter Data-Mining-Werkzeuge 165 3.2.5.1 Auswahlprozess 165 3.2.5.2 Kriterien 166 3.2.5.3 Werkzeuge zur statistischen Analyse und Visualisierung 168 3.2.5.4 Werkzeuge für Clustering und Diskriminanzanalyse 168 3.2.5.5 Werkzeuge für Online Analytical Processing 169 3.2.5.6 Werkzeuge für Bayes’sche Netze 169 3.3 Untersuchungsdesign 171 3.3.1 Online-Marketing-Instrumente bei Andasa 172 3.3.2 Stimulus-Auswahl 174 3.3.3 Entwurf des Experimentaldesigns 175 4 Umsetzung 180 4.1 Architektur und prototypische Implementation 180 4.1.1 Das Data-Mining-System 180 4.1.2 Der ETL-Prozess 181 4.1.2.1 Datenerhebung 183 4.1.2.2 Datenbereinigung 184 4.1.3 Die A/B-Testumgebung 185 4.1.4 Das Empfehlungssystem 189 4.1.5 Usability-Evaluation 196 4.2 Data Mining 199 4.2.1 Statistische Analyse 200 4.2.2 Anwendung ausgewählter Data-Mining-Methoden 206 4.2.2.1 Clustering 208 4.2.2.2 Klassifikation 213 4.2.2.3 Modellierung als Bayes’sche Netze 214 4.2.3 Ergebnisse und Evaluation 221 4.3 Feldexperimente mit Newslettern 222 4.3.1 Eckdaten der Tests 223 4.3.2 Beispiel-Experimente 224 4.3.3 A/B-Tests Rabattdarstellungen 226 4.3.3.1 Öffnungsrate Prozente vs. Euro 226 4.3.3.2 Klickrate Prozente vs. Euro 227 4.3.3.3 Conversion-Rate Prozente vs. Euro 229 4.3.4 A/B-Test zur Personalisierung 230 4.3.4.1 Auswahl des Empfehlungsverfahrens 230 4.3.4.2 Definition der Kontrollgruppe 231 4.3.4.3 Operative Durchführung 231 4.3.4.4 Auswertung 232 4.3.5 Ergebnisse und Evaluation 236 5 Zusammenfassung und Ausblick 239 6 Anhang 243 6.1 Anhang A Usability-Evaluation 243 6.1.1 Methoden der Usability-Evaluierung 246 6.1.1.1 Usability-Tests und lautes Denken 246 6.1.1.2 Benutzerbefragung 248 6.1.1.3 Feldstudien und Partizipation 250 6.1.1.4 Expertenorientierte (Inspektions-)Methoden 251 6.1.1.5 Formal-analytische Verfahren 252 6.1.1.6 Quantitative Fragebogen 252 6.1.1.7 Verfahrensmodell 259 6.1.1.8 Auswertung 262 6.1.2 Fragebögen 263 6.2 Anhang B Zeitreihenanalyse 281 6.2.1 Klassische Komponentenmodelle 281 6.2.2 Stochastische Prozesse 282 6.2.3 Fourier-Analyse-Methoden (Spektralanalyse) 283 6.3 Anhang C Daten und Programme 286 6.3.1 Technische Daten 286 6.3.1.1 Data Warehouse / Data Mining Server 286 6.3.2 Programm- und Skriptcodes 287 6.3.2.1 R- Skripte 287 6.3.2.2 SQL – Skripte 296 6.3.2.3 C# Code MostRecentLinkInvocationsShopRecommender.cs 314 6.3.3 Daten A/B-Tests 317 6.3.3.1 Übersicht Newsletter 317 6.3.3.2 Mengengerüst Aussendungen 319 6.3.3.3 Shopaufrufe und Besteller 319 6.3.3.4 Darstellungen der Newsletter-Varianten 320 6.3.4 Daten Personalisierung 335 6.4 Abbildungsverzeichnis 338 6.5 Tabellenverzeichnis 343 6.6 Literaturverzeichnis 34
    corecore