Search CORE

680 research outputs found

EMail Data Mining: An Approach to Construct an Organization Position-wise Structure While Performing EMail Analysis

Author: Vadher Bhargav
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2010
Field of study

In this age of social networking, it is necessary to define the relationships among the members of a social network. Various techniques are already available to define user- to-user relationships across the network. Over time, many algorithms and machine learning techniques were applied to find relationships over social networks, yet very few techniques and information are available to define a relation directly over raw email data. Few educational societies have developed a way to mine the email log files and have found the inter-relation between the users by means of clusters. Again, there is no solid technique available that can accurately predict the ranking of each user within an organization by mining through their email transaction logs. The author in this report presents a technique to mine the email data log files in order to figure out the position wise structure of an organization. The author also discusses send-receive analysis, statistical analysis, semantic analysis and temporal analysis over the data, and has applied them to test cases. Throughout the research the author has used the Enron employees email log files, which was made public on 2001

SJSU ScholarWorks

HoneyCode: Automating Deceptive Software Repositories with Deep Generative Models

Author: Kanhere Salil
Liebowitz David
Nepal Surya
Nguyen David
Publication venue: AIS Electronic Library (AISeL)
Publication date: 04/01/2021
Field of study

We propose HoneyCode, an architecture for the generation of synthetic software repositories for cyber deception. The synthetic repositories have the characteristics of real software, including language features, file names and extensions, but contain no real intellectual property. The fake software can be used as a honeypot or form part of a deceptive environment. Existing approaches to software repository generation lack scalability due to reliance on hand-crafted structures for specific languages. Our approach is language agnostic and learns the underlying representations of repository structures, filenames and file content through a novel Tree Recurrent Network (TRN) and two recurrent networks (RNN) respectively. Each stage of the sequential generation process utilises features from prior steps, which increases the honey repository’s authenticity and consistency. Experiments show TRN generates tree samples that reduce degree mean maximal distance (MMD) by 90-92% and depth MMD by 75-86% to a held out test data set in comparison to recent deep graph generators and a baseline random tree generator. In addition, our RNN models generate convincing filenames with authentic syntax and realistic file content

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)