205,335 research outputs found

    A Test Driven Approach to Develop Web-Based Machine Learning Applications

    Full text link
    The purpose of this thesis is to propose the design and architecture of a testable, scalable, and ef-cient web-based application that models and implements machine learning applications in cancer prediction. There are various components that form the architecture of our web-based application including server, database, programming language, web framework, and front-end design. There are also other factors associated with our application such as testability, scalability, performance, and design pattern. Our main focus in this thesis is on the testability of the system while consid- ering the importance of other factors as well. The data set for our application is a subset of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The application is implemented with Python as the back-end programming language, Django as the web framework, Sqlite as the database, and the built-in server of the Django framework. The front layer of the application is built using HTML, CSS and various JavaScript libraries. Our Implementation and Installation is augmented with testing phase that include unit and functional testing. There are other layers such as deploying, caching, security, and scaling that will be briefly discussed

    Analyzing Android Browser Apps for file:// Vulnerabilities

    Full text link
    Securing browsers in mobile devices is very challenging, because these browser apps usually provide browsing services to other apps in the same device. A malicious app installed in a device can potentially obtain sensitive information through a browser app. In this paper, we identify four types of attacks in Android, collectively known as FileCross, that exploits the vulnerable file:// to obtain users' private files, such as cookies, bookmarks, and browsing histories. We design an automated system to dynamically test 115 browser apps collected from Google Play and find that 64 of them are vulnerable to the attacks. Among them are the popular Firefox, Baidu and Maxthon browsers, and the more application-specific ones, including UC Browser HD for tablet users, Wikipedia Browser, and Kids Safe Browser. A detailed analysis of these browsers further shows that 26 browsers (23%) expose their browsing interfaces unintentionally. In response to our reports, the developers concerned promptly patched their browsers by forbidding file:// access to private file zones, disabling JavaScript execution in file:// URLs, or even blocking external file:// URLs. We employ the same system to validate the ten patches received from the developers and find one still failing to block the vulnerability.Comment: The paper has been accepted by ISC'14 as a regular paper (see https://daoyuan14.github.io/). This is a Technical Report version for referenc

    Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data

    Get PDF
    Recent years have seen the rise of more sophisticated attacks including advanced persistent threats (APTs) which pose severe risks to organizations and governments by targeting confidential proprietary information. Additionally, new malware strains are appearing at a higher rate than ever before. Since many of these malware are designed to evade existing security products, traditional defenses deployed by most enterprises today, e.g., anti-virus, firewalls, intrusion detection systems, often fail at detecting infections at an early stage. We address the problem of detecting early-stage infection in an enterprise setting by proposing a new framework based on belief propagation inspired from graph theory. Belief propagation can be used either with "seeds" of compromised hosts or malicious domains (provided by the enterprise security operation center -- SOC) or without any seeds. In the latter case we develop a detector of C&C communication particularly tailored to enterprises which can detect a stealthy compromise of only a single host communicating with the C&C server. We demonstrate that our techniques perform well on detecting enterprise infections. We achieve high accuracy with low false detection and false negative rates on two months of anonymized DNS logs released by Los Alamos National Lab (LANL), which include APT infection attacks simulated by LANL domain experts. We also apply our algorithms to 38TB of real-world web proxy logs collected at the border of a large enterprise. Through careful manual investigation in collaboration with the enterprise SOC, we show that our techniques identified hundreds of malicious domains overlooked by state-of-the-art security products

    Towards a relation extraction framework for cyber-security concepts

    Full text link
    In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised Natural Language Processing and implement a bootstrapping algorithm for extracting security entities and their relationships from text. The algorithm requires little input data, specifically, a few relations or patterns (heuristics for identifying relations), and incorporates an active learning component which queries the user on the most important decisions to prevent drifting from the desired relations. Preliminary testing on a small corpus shows promising results, obtaining precision of .82.Comment: 4 pages in Cyber & Information Security Research Conference 2015, AC
    corecore