953 research outputs found

    Large Scale Malware Analysis, Detection and Signature Generation.

    Full text link
    As the primary vehicle for most organized cybercrimes, malicious software (or malware) has become one of the most serious threats to computer systems and the Internet. With the recent advent of automated malware development toolkits, it has become relatively easy, even for marginally skilled adversaries, to create and mutate malware, bypassing Anti-Virus (AV) detection. This has led to a surge in the number of new malware threats and has created several major challenges for the AV industry. AV companies typically receive tens of thousands of suspicious samples daily. However, the overwhelming number of new malware easily overtax the available human resources at AV companies, making them less responsive to emerging threats and leading to poor detection rates. To address these issues, this dissertation proposes several new and scalable systems to facilitate malware analysis and detection, with the focus on a central theme: ``automation and scalability". This dissertation makes four primary contributions. First, it builds a large-scale malware database management system called SMIT that addresses the challenges of determining whether a suspicious sample is indeed malicious. SMIT exploits the insight that most new malicious samples are simple syntactic variations of existing malware. Thus, one way to ascertain the maliciousness of an unknown sample is to check if it is sufficiently similar to any existing malware. SMIT is designed to make such decisions efficiently using malware's function call graph---a high-level structural representation that is less susceptible to the low-level obfuscation employed by malware writers to evade detection. Second, the dissertation develops an automatic malware clustering system called MutantX. By quickly grouping similar samples into clusters, MutantX allows malware analysts to focus on representative samples and automatically generate labels based on samples’ association with existing groups. Third, this dissertation introduces a signature-generation system, called Hancock, that automatically creates high-quality string signatures with extremely low false-positive rates. Finally, observing that two widely used malware analysis approaches---i.e., static and dynamic analyses---have their respective pros and cons, this dissertation proposes a novel system that optimally integrates static-feature and dynamic-behavior based malware clusterings, mitigating their respective shortcomings without losing their merits.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89760/1/huxin_1.pd

    A Novel Malware Target Recognition Architecture for Enhanced Cyberspace Situation Awareness

    Get PDF
    The rapid transition of critical business processes to computer networks potentially exposes organizations to digital theft or corruption by advanced competitors. One tool used for these tasks is malware, because it circumvents legitimate authentication mechanisms. Malware is an epidemic problem for organizations of all types. This research proposes and evaluates a novel Malware Target Recognition (MaTR) architecture for malware detection and identification of propagation methods and payloads to enhance situation awareness in tactical scenarios using non-instruction-based, static heuristic features. MaTR achieves a 99.92% detection accuracy on known malware with false positive and false negative rates of 8.73e-4 and 8.03e-4 respectively. MaTR outperforms leading static heuristic methods with a statistically significant 1% improvement in detection accuracy and 85% and 94% reductions in false positive and false negative rates respectively. Against a set of publicly unknown malware, MaTR detection accuracy is 98.56%, a 65% performance improvement over the combined effectiveness of three commercial antivirus products

    Is there anything new to say about SIFT matching?

    Get PDF
    SIFT is a classical hand-crafted, histogram-based descriptor that has deeply influenced research on image matching for more than a decade. In this paper, a critical review of the aspects that affect SIFT matching performance is carried out, and novel descriptor design strategies are introduced and individually evaluated. These encompass quantization, binarization and hierarchical cascade filtering as means to reduce data storage and increase matching efficiency, with no significant loss of accuracy. An original contextual matching strategy based on a symmetrical variant of the usual nearest-neighbor ratio is discussed as well, that can increase the discriminative power of any descriptor. The paper then undertakes a comprehensive experimental evaluation of state-of-the-art hand-crafted and data-driven descriptors, also including the most recent deep descriptors. Comparisons are carried out according to several performance parameters, among which accuracy and space-time efficiency. Results are provided for both planar and non-planar scenes, the latter being evaluated with a new benchmark based on the concept of approximated patch overlap. Experimental evidence shows that, despite their age, SIFT and other hand-crafted descriptors, once enhanced through the proposed strategies, are ready to meet the future image matching challenges. We also believe that the lessons learned from this work will inspire the design of better hand-crafted and data-driven descriptors

    Known and unknown requirements in healthcare

    Get PDF
    We report experience in requirements elicitation of domain knowledge from experts in clinical and cognitive neurosciences. The elicitation target was a causal model for early signs of dementia indicated by changes in user behaviour and errors apparent in logs of computer activity. A Delphi-style process consisting of workshops with experts followed by a questionnaire was adopted. The paper describes how the elicitation process had to be adapted to deal with problems encountered in terminology and limited consensus among the experts. In spite of the difficulties encountered, a partial causal model of user behavioural pathologies and errors was elicited. This informed requirements for configuring data- and text-mining tools to search for the specific data patterns. Lessons learned for elicitation from experts are presented, and the implications for requirements are discussed as “unknown unknowns”, as well as configuration requirements for directing data-/text-mining tools towards refining awareness requirements in healthcare applications

    Running deep learning applications on resource constrained devices

    Get PDF
    The high accuracy of Deep Neural Networks (DNN) come at the expense of high computational cost and memory requirements. During inference, the data is often collected on the edge device which are resource-constrained. The existing solutions for edge deployment include i) executing the entire DNN on the edge (EDGE-ONLY), ii) sending the input from edge to cloud where the DNN is processed (CLOUD-ONLY), and iii) splitting the DNN to execute partially on the edge and partially on the cloud (SPLIT). The choice of deployment between EDGE-ONLY, CLOUD-ONLY and SPLIT is determined by several operating constraints such as device resources and network speed, and application constraints such as latency and accuracy. The EDGE-ONLY approach requires compact DNN with low compute and memory requirements. Thus, the emerging class of DNNs employ low-rank convolutions (LRCONVs) which reduce one or more dimensions compared to the spatial convolutions (CONV). Prior research in hardware accelerators has largely focused on CONVs. The LRCONVs such as depthwise and pointwise convolutions exhibit lower arithmetic intensity and lower data reuse. Thus, LRCONVs result in low hardware utilization and high latency. In our first work, we systematically explore the design space of Cross-layer dataflows to exploit data reuse across layers for emerging DNNs in EDGE-ONLY scenarios. We develop novel fine-grain cross-layer dataflows for LRCONVs that support partial loop dimension completion. Our tool, X-Layer decouples the nested loops in a pipeline and combines them to create a common outer dataflow and several inner dataflows. The CLOUD-ONLY approach can suffer from high latency due to the high transmission cost of large input data from the edge to the cloud. This could be a problem, especially for latency-critical applications. Thankfully, the SPLIT approach reduces latency compared to the CLOUD-ONLY approach. However, existing solutions only split the DNN in floating-point precision. Executing floating-point precision on the edge device can occupy large memory and reduce the potential options for SPLIT solutions. In our second work, we expand and explore the search space of SPLIT solutions by jointly applying mixed-precision post-training quantization and DNN graph split. Our work, Auto-Split finds a balance in the trade-off among the model accuracy, edge device capacity, transmission cost, and the overall latency

    Detekcija linearnih struktura na SDSS slikama

    Get PDF
    Sloan Digital Sky Survey (SDSS), covering just over 35% of the full sky, is the largest sky survey conducted. Many of the images made contain linear features that can only be explained by objects traveling at different angular velocities than the background sky. Certain number of linear features can be attributed to meteors, which makes their extraction from images scientifically interesting. In this work I present and discuss the full technical details of a software tool capable of searching through the entire SDSS imaging database for such linear features

    Detekcija linearnih struktura na SDSS slikama

    Get PDF
    Sloan Digital Sky Survey (SDSS), covering just over 35% of the full sky, is the largest sky survey conducted. Many of the images made contain linear features that can only be explained by objects traveling at different angular velocities than the background sky. Certain number of linear features can be attributed to meteors, which makes their extraction from images scientifically interesting. In this work I present and discuss the full technical details of a software tool capable of searching through the entire SDSS imaging database for such linear features

    Help-Seeking Decisions and Child Welfare. An exploration of situated decision making

    Get PDF
    Family support services aim to support parents and carers with the task of bringing up children; these services consistently report problems, however, in attracting helpseekers. Despite recent developments within child-welfare towards the provision of family-friendly services, self-referral rates remain low constituting at best 30% of all referrals. Agencies also report that families are reluctant to take up services following third party (frequently professional) referral. Despite these consistent findings the extant literature on help-seeking offers few insights into how social actors, in the face of family problems, make choices between the available sources of help. Within the extant literature studies consistently report that families prefer `informal' support but few insights are offered about how such decisions are made and how preference is organised in relation to diverse sources of support. In this thesis and focusing on talk about `help-seeking' in focus group and interview settings, analysis centres on exploring the accountable properties of situated decision-making. From analysis of situated talk, the study offers insights and raises questions for further research that may assist family support agencies to more appropriately tailor their services to meet the needs of service users. The present study is much inspired by the work of Harvey Sacks in particular his development of Membership Categorisation Analysis. In making use of Hester and Eglin's occasioned model of MCA (1997) it has been possible to explore practical reasoning in and through the local, sequential and categorical organisation of talk. Analysis of situated decision-making, in relation to the topic `help-seeking', finds decision-making a highly organised practical activity such that any social actor canmake an `educated' guess about who, another, would suggest as a first category for help. Research participants, in deciding who should hypothetically be approached first for help, constituted a socially sanctioned order to help-seeking characterised by first-position category pairs and last-position category pairs. Use of, or reference to, prior knowledge of help-seeking encounters was also identified as a key decision making resource. This thesis concludes with a policy discussion and raises a number of speculative comments arising from the study that are relevant for the development of child welfare services. A number of avenues are suggested for further research, in particular questions are asked about the continued practice and emphasis within child-welfare services on professional social diagnosis, with the attendant neglect of help-seeking as a socially organised activity. The study suggests that future research might centre on further analysis of how `family support' is organised within the family and prior to professional intervention. It is also suggested that further research examine the possibilities of response to requests for help as a better starting point for service delivery, rather than professional detection of `problems'
    corecore