5,347 research outputs found

    Security Analytics: Using Deep Learning to Detect Cyber Attacks

    Get PDF
    Security attacks are becoming more prevalent as cyber attackers exploit system vulnerabilities for financial gain. The resulting loss of revenue and reputation can have deleterious effects on governments and businesses alike. Signature recognition and anomaly detection are the most common security detection techniques in use today. These techniques provide a strong defense. However, they fall short of detecting complicated or sophisticated attacks. Recent literature suggests using security analytics to differentiate between normal and malicious user activities. The goal of this research is to develop a repeatable process to detect cyber attacks that is fast, accurate, comprehensive, and scalable. A model was developed and evaluated using several production log files provided by the University of North Florida Information Technology Security department. This model uses security analytics to complement existing security controls to detect suspicious user activity occurring in real time by applying machine learning algorithms to multiple heterogeneous server-side log files. The process is linearly scalable and comprehensive; as such it can be applied to any enterprise environment. The process is composed of three steps. The first step is data collection and transformation which involves identifying the source log files and selecting a feature set from those files. The resulting feature set is then transformed into a time series dataset using a sliding time window representation. Each instance of the dataset is labeled as green, yellow, or red using three different unsupervised learning methods, one of which is Partitioning around Medoids (PAM). The final step uses Deep Learning to train and evaluate the model that will be used for detecting abnormal or suspicious activities. Experiments using datasets of varying sizes of time granularity resulted in a very high accuracy and performance. The time required to train and test the model was surprisingly fast even for large datasets. This is the first research paper that develops a model to detect cyber attacks using security analytics; hence this research builds a foundation on which to expand upon for future research in this subject area

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Event detection in high throughput social media

    Get PDF

    The EDAM Project: Mining Atmospheric Aerosol Datasets

    Get PDF
    Data mining has been a very active area of research in the database, machine learning, and mathematical programming communities in recent years. EDAM (Exploratory Data Analysis and Management) is a joint project between researchers in Atmospheric Chemistry and Computer Science at Carleton College and the University of Wisconsin-Madison that aims to develop data mining techniques for advancing the state of the art in analyzing atmospheric aerosol datasets. There is a great need to better understand the sources, dynamics, and compositions of atmospheric aerosols. The traditional approach for particle measurement, which is the collection of bulk samples of particulates on filters, is not adequate for studying particle dynamics and real-time correlations. This has led to the development of a new generation of real-time instruments that provide continuous or semi-continuous streams of data about certain aerosol properties. However, these instruments have added a significant level of complexity to atmospheric aerosol data, and dramatically increased the amounts of data to be collected, managed, and analyzed. Our abilit y to integrate the data from all of these new and complex instruments now lags far behind our data-collection capabilities, and severely limits our ability to understand the data and act upon it in a timely manner. In this paper, we present an overview of the EDAM project. The goal of the project, which is in its early stages, is to develop novel data mining algorithms and approaches to managing and monitoring multiple complex data streams. An important objective is data quality assurance, and real-time data mining offers great potential. The approach that we take should also provide good techniques to deal with gas-phase and semi-volatile data. While atmospheric aerosol analysis is an important and challenging domain that motivates us with real problems and serves as a concrete test of our results, our objective is to develop techniques that have broader applicability, and to explore some fundamental challenges in data mining that are not specific to any given application domain
    • …
    corecore