Database-Integrated Analytics

Xie, Shaolin

Database-Integrated Analytics

Authors: Shaolin Xie
Publication date: 5 May 2021
Publisher: Worcester Polytechnic Institute - Gordon Library

Abstract

The coordination between data analytics and database systems becomes exceedingly important in order for data scientists to efficiently analyze data that is stored inside the database. Currently, there are three approaches to use data analysis tools with databases: client-server connection, in-database processing, and embedded database. This project focuses on comparing the client-server connection to the in-database processing. Two machine learning models - Support Vector Machine and Random Forest - are implemented using each of the approaches and then tested on datasets of different scales. In this project, the in-database processing approach is achieved using Apache MADlib, and the client-server connection approach is implemented using python codes. After comparing the run-time efficiency and the testing accuracy of the two approaches, conclusions are drawn regarding the performance of each approach

Similar works

Full text

Available Versions

Sustaining member

Digital WPI

oai:digitalwpi:vq27zr570

Last time updated on 16/05/2024