4 research outputs found

    Creating Network Attack Priority Lists by Analyzing Email Traffic Using Predefined Profiles

    Get PDF
    Networks can be vast and complicated entities consisting of both servers and workstations that contain information sought by attackers. Searching for specific data in a large network can be a time consuming process. Vast amounts of data either passes through or is stored by various servers on the network. However, intermediate work products are often kept solely on workstations. Potential high value targets can be passively identified by comparing user email traffic against predefined profiles. This method provides a potentially smaller footprint on target systems, less human interaction, and increased efficiency of attackers. Collecting user email traffic and comparing each word in an email to a predefined profile, or a list of key words of interest to the attacker, can provide a prioritized list of systems containing the most relevant information. This research uses two experiments. The functionality experiment uses randomly generated emails and profiles, demonstrating MAPS (Merritt\u27s Adaptive Profiling System)ability to accurately identify matches. The utility experiment uses an email corpus and meaningful profiles, further demonstrating MAPS ability to accurately identify matches with non-random input. A meaningful profile is a list of words bearing a semantic relationship to a topic of interest to the attacker. Results for the functionality experiment show MAPS can parse randomly generated emails and identify matches with an accuracy of 99 percent or above. The utility experiment using an email corpus with meaningful profiles, shows slightly lower accuracies of 95 percent or above. Based upon the match results, network attack priority lists are generated. A network attack priority list is an ordered list of systems, where the potentially highest value systems exhibit the greatest fit to the profile. An attacker then uses the list when searching for target information on the network to prioritize the systems most likely to contain useful data

    Multiview semi-supervised learning with consensus

    Get PDF
    Ministry of Education, Singapore under its Academic Research Funding Tier 1; Tier

    Advancing Urban Mobility with Algorithm Engineering

    Get PDF

    Transfer learning for information retrieval

    Get PDF
    The lack of relevance labels is increasingly challenging and presents a bottleneck in the training of reliable learning-to-rank (L2R) models. Obtaining relevance labels using human judgment is expensive and even impossible in some scenarios. Previous research has studied different approaches to solving the problem including generating relevance labels by crowdsourcing and active learning. Recent studies have started to find ways to reuse knowledge from a related collection to help the ranking in a new collection. However, the effectiveness of a ranking function trained in one collection may be degraded when used in another collection due to the generalization issues of machine learning. Transfer learning involves a set of algorithms that are used to train or adapt a model for a target collection without sucient training labels by transferring knowledge from a related source collection with abundant labels. Transfer learning can also be applied to L2R to help train ranking functions for a new task by reusing data from a related collection while minimizing the generalization gap. Some attempts have been made to apply transfer learning techniques on L2R tasks. This thesis investigates different approaches to transfer learning methods for L2R, which are called transfer ranking. However, most of the existing studies on transfer ranking have been focused on the scenario when there are a small but not sucient number of relevance labels. The field of transfer ranking with no target collection labels is still relatively undeveloped. Moreover, the main reason why a transfer ranking solution is needed is that a ranking function trained in the source collection cannot generalize to the target collection, due to the differences in the data distribution of the two collections. However, the effect of the data distribution differences on ranking model generalization has not been examined in detail. The focus of this study is the scenario when there are no relevance labels from the new collection (the target collection), but where a related collection (the target collection) has an abundant amount of training data and labels. In this thesis, we first demonstrate the generalization gap of different L2R algorithms when the distribution of the source and target collections are different in multiple ways, and we then develop alternative solutions to tackling the problem, which includes instance weighting algorithms and self-labeling methods. Instance weighting algorithms estimate weights for each training query in the source collection according to the target query distribution and use the weighted objective function to optimize a ranking function for the target collection. The results on different test collections suggest that instance weighting methods, including existing approaches, are not reliable. The self-labeling methods use other approaches to generate imputed relevance labels for queries in the target collection, which look to transfer the ranking knowledge to the target collection by transferring the label knowledge. The algorithms were tested on various transferring scenarios and showed significant effectiveness and consistency. We thus demonstrate that the performance of self-labeling methods can be further improved with a minimal number of calibration labels from the target collection. The algorithms and knowledge developed in this thesis can help solve generic ranking knowledge transfer problems under different scenarios
    corecore