We propose an architecture for analysing database connection logs across
different instances of databases within an intranet comprising over 10,000
users and associated devices. Our system uses Flume agents to send
notifications to a Hadoop Distributed File System for long-term storage and
ElasticSearch and Kibana for short-term visualisation, effectively creating a
data lake for the extraction of log data. We adopt machine learning models with
an ensemble of approaches to filter and process the indicators within the data
and aim to predict anomalies or outliers using feature vectors built from this
log data