7,761 research outputs found

    Security Toolbox for Detecting Novel and Sophisticated Android Malware

    Full text link
    This paper presents a demo of our Security Toolbox to detect novel malware in Android apps. This Toolbox is developed through our recent research project funded by the DARPA Automated Program Analysis for Cybersecurity (APAC) project. The adversarial challenge ("Red") teams in the DARPA APAC program are tasked with designing sophisticated malware to test the bounds of malware detection technology being developed by the research and development ("Blue") teams. Our research group, a Blue team in the DARPA APAC program, proposed a "human-in-the-loop program analysis" approach to detect malware given the source or Java bytecode for an Android app. Our malware detection apparatus consists of two components: a general-purpose program analysis platform called Atlas, and a Security Toolbox built on the Atlas platform. This paper describes the major design goals, the Toolbox components to achieve the goals, and the workflow for auditing Android apps. The accompanying video (http://youtu.be/WhcoAX3HiNU) illustrates features of the Toolbox through a live audit.Comment: 4 pages, 1 listing, 2 figure

    Qualitative Effects of Knowledge Rules in Probabilistic Data Integration

    Get PDF
    One of the problems in data integration is data overlap: the fact that different data sources have data on the same real world entities. Much development time in data integration projects is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates from the integration result or solve other semantic conflicts, but it proofs impossible to get rid of all semantic problems in data integration. An often-used rule of thumb states that about 90% of the development effort is devoted to solving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that stores any remaining semantic uncertainty and conflicts in a probabilistic database enabling it to already be meaningfully used. The main development effort in our approach is devoted to defining and tuning knowledge rules and thresholds. Rules and thresholds directly impact the size and quality of the integration result. We measure integration quality indirectly by measuring the quality of answers to queries on the integrated data set in an information retrieval-like way. The main contribution of this report is an experimental investigation of the effects and sensitivity of rule definition and threshold tuning on the integration quality. This proves that our approach indeed reduces development effort — and not merely shifts the effort to rule definition and threshold tuning — by showing that setting rough safe thresholds and defining only a few rules suffices to produce a ‘good enough’ integration that can be meaningfully used

    Assessing and refining mappings to RDF to improve dataset quality

    Get PDF
    RDF dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually -but rarely- applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the RDF dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for RDF datasets stemming originally from (semi-) structured data (e.g., CSV, XML, JSON). In this work, we focus on assessing and improving their mappings. We incorporate (i) a test-driven approach for assessing the mappings instead of the RDF dataset itself, as mappings reflect how the dataset will be formed when generated; and (ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to diverse cases, e.g., large, crowdsourced datasets such as DBpedia, or newly generated, such as iLastic. Our evaluation indicates the efficiency of our workflow, as it significantly improves the overall quality of an RDF dataset in the observed cases

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Case study in six sigma methadology : manufacturing quality improvement and guidence for managers

    Get PDF
    This article discusses the successful implementation of Six Sigma methodology in a high precision and critical process in the manufacture of automotive products. The Six Sigma define–measure–analyse–improve–control approach resulted in a reduction of tolerance-related problems and improved the first pass yield from 85% to 99.4%. Data were collected on all possible causes and regression analysis, hypothesis testing, Taguchi methods, classification and regression tree, etc. were used to analyse the data and draw conclusions. Implementation of Six Sigma methodology had a significant financial impact on the profitability of the company. An approximate saving of US$70,000 per annum was reported, which is in addition to the customer-facing benefits of improved quality on returns and sales. The project also had the benefit of allowing the company to learn useful messages that will guide future Six Sigma activities

    Methods for Semantic Interoperability in AutomationML-based Engineering

    Get PDF
    Industrial engineering is an interdisciplinary activity that involves human experts from various technical backgrounds working with different engineering tools. In the era of digitization, the engineering process generates a vast amount of data. To store and exchange such data, dedicated international standards are developed, including the XML-based data format AutomationML (AML). While AML provides a harmonized syntax among engineering tools, the semantics of engineering data remains highly heterogeneous. More specifically, the AML models of the same domain or entity can vary dramatically among different tools that give rise to the so-called semantic interoperability problem. In practice, manual implementation is often required for the correct data interpretation, which is usually limited in reusability. Efforts have been made for tackling the semantic interoperability problem. One mainstream research direction has been focused on the semantic lifting of engineering data using Semantic Web technologies. However, current results in this field lack the study of building complex domain knowledge that requires a profound understanding of the domain and sufficient skills in ontology building. This thesis contributes to this research field in two aspects. First, machine learning algorithms are developed for deriving complex ontological concepts from engineering data. The induced concepts encode the relations between primitive ones and bridge the semantic gap between engineering tools. Second, to involve domain experts more tightly into the process of ontology building, this thesis proposes the AML concept model (ACM) for representing ontological concepts in a native AML syntax, i.e., providing an AML-frontend for the formal ontological semantics. ACM supports the bidirectional information flow between the user and the learner, based on which the interactive machine learning framework AMLLEARNER is developed. Another rapidly growing research field devotes to develop methods and systems for facilitating data access and exchange based on database theories and techniques. In particular, the so-called Query By Example (QBE) allows the user to construct queries using data examples. This thesis adopts the idea of QBE in AML-based engineering by introducing the AML Query Template (AQT). The design of AQT has been focused on a native AML syntax, which allows constructing queries with conventional AML tools. This thesis studies the theoretical foundation of AQT and presents algorithms for the automated generation of query programs. Comprehensive requirement analysis shows that the proposed approach can solve the problem of semantic interoperability in AutomationML-based engineering to a great extent

    Ontology based semantic-predictive model for reconfigurable automation systems

    Get PDF
    Due to increasing product variety and complexity, capability to support reconfiguration is a key competitiveness indicator for current automation system within large enterprises. Reconfigurable manufacturing systems could efficiently reuse existing knowledge in order to decrease the required skills and design time to launch new products. However, most of the software tools developed to support design of reconfigurable manufacturing system lack integration of product, process and resource knowledge, and the design data is not transferred from domain-specific engineering tools to a collaborative and intelligent platform to capture and reuse design knowledge. The focus of this research study is to enable integrated automation systems design to support a knowledge reuse approach to predict process and resource changes when product requirements change. The proposed methodology is based on a robust semantic-predictive model supported by ontology representations and predictive algorithms for the integration of Product, Process, Resource and Requirement (PPRR) data, so that future automation system changes can be identified at early design stages
    corecore