Search CORE

9,408 research outputs found

SAFE: Self-Attentive Function Embeddings for Binary Similarity

Author: Baldoni Roberto
Di Luna Giuseppe Antonio
Massarelli Luca
Petroni Fabio
Querzoni Leonardo
Publication venue
Publication date: 01/01/2019
Field of study

The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare functions by first transforming their binary code in multi-dimensional vector representations (embeddings), and then comparing vectors through simple and efficient geometric operations. However, embeddings are usually derived from binary code using manual feature extraction, that may fail in considering important function characteristics, or may consider features that are not important for the binary similarity problem. In this paper we propose SAFE, a novel architecture for the embedding of functions based on a self-attentive neural network. SAFE works directly on disassembled binary functions, does not require manual feature extraction, is computationally more efficient than existing solutions (i.e., it does not incur in the computational overhead of building or manipulating control flow graphs), and is more general as it works on stripped binaries and on multiple architectures. We report the results from a quantitative and qualitative analysis that show how SAFE provides a noticeable performance improvement with respect to previous solutions. Furthermore, we show how clusters of our embedding vectors are closely related to the semantic of the implemented algorithms, paving the way for further interesting applications (e.g. semantic-based binary function search).Comment: Published in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) 201

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Planet Hunters: New Kepler planet candidates from analysis of quarter 2

Author: Batalha
Batalha
Borucki
Brown
Charlie Sharzer
Chris J. Lintott
Christiansen
Darin Ragozzine
Debra A. Fischer
Gilliland
Jari-Pekka Pääkkönen
Jason F. Rowe
Jenkins
Joe Gilardi
John Brewer
Jon Jenkins
Kevin Schwainski
Kian J. Jek
Latham
Lissauer
Lissauer
Mandel
Markwardt
Matthew Giguere
Megan E. Schwamb
Michael Parrish
Morton
Natalie Batalha
Ragozzine & Holman
Robert Gagliano
Schneider
Schwamb
Steve Bryson
Stuart Lynn
Tenenbaum
Thomas Barclay
Tjapko Smits
Verner
Publication venue: 'IOP Publishing'
Publication date: 27/02/2012
Field of study

We present new planet candidates identified in NASA Kepler quarter two public release data by volunteers engaged in the Planet Hunters citizen science project. The two candidates presented here survive checks for false-positives, including examination of the pixel offset to constrain the possibility of a background eclipsing binary. The orbital periods of the planet candidates are 97.46 days (KIC 4552729) and 284.03 (KIC 10005758) days and the modeled planet radii are 5.3 and 3.8 R_Earth. The latter star has an additional known planet candidate with a radius of 5.05 R_Earth and a period of 134.49 which was detected by the Kepler pipeline. The discovery of these candidates illustrates the value of massively distributed volunteer review of the Kepler database to recover candidates which were otherwise uncatalogued.Comment: Accepted to A

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Crossref

Oxford University Research Archive

Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned

Author: Cha Sang Kil
Kim Dongkwan
Kim Eunsoo
Kim Yongdae
Son Sooel
Publication venue
Publication date: 20/07/2021
Field of study

Binary code similarity analysis (BCSA) is widely used for diverse security applications such as plagiarism detection, software license violation detection, and vulnerability discovery. Despite the surging research interest in BCSA, it is significantly challenging to perform new research in this field for several reasons. First, most existing approaches focus only on the end results, namely, increasing the success rate of BCSA, by adopting uninterpretable machine learning. Moreover, they utilize their own benchmark sharing neither the source code nor the entire dataset. Finally, researchers often use different terminologies or even use the same technique without citing the previous literature properly, which makes it difficult to reproduce or extend previous work. To address these problems, we take a step back from the mainstream and contemplate fundamental research questions for BCSA. Why does a certain technique or a feature show better results than the others? Specifically, we conduct the first systematic study on the basic features used in BCSA by leveraging interpretable feature engineering on a large-scale benchmark. Our study reveals various useful insights on BCSA. For example, we show that a simple interpretable model with a few basic features can achieve a comparable result to that of recent deep learning-based approaches. Furthermore, we show that the way we compile binaries or the correctness of underlying binary analysis tools can significantly affect the performance of BCSA. Lastly, we make all our source code and benchmark public and suggest future directions in this field to help further research.Comment: 22 pages, under revision to Transactions on Software Engineering (July 2021

arXiv.org e-Print Archive

Augmented reality device for first response scenarios

Author: Bogucki Robert Andrzej
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2006
Field of study

A prototype of a wearable computer system is proposed and implemented using commercial off-shelf components. The system is designed to allow the user to access location-specific information about an environment, and to provide capability for user tracking. Areas of applicability include primarily first response scenarios, with possible applications in maintenance or construction of buildings and other structures. Necessary preparation of the target environment prior to system\u27s deployment is limited to noninvasive labeling using optical fiducial markers. The system relies on computational vision methods for registration of labels and user position. With the system the user has access to on-demand information relevant to a particular real-world location. Team collaboration is assisted by user tracking and real-time visualizations of team member positions within the environment. The user interface and display methods are inspired by Augmented Reality1 (AR) techniques, incorporating a video-see-through Head Mounted Display (HMD) and fingerbending sensor glove.*. 1Augmented reality (AR) is a field of computer research which deals with the combination of real world and computer generated data. At present, most AR research is concerned with the use of live video imagery which is digitally processed and augmented by the addition of computer generated graphics. Advanced research includes the use of motion tracking data, fiducial marker recognition using machine vision, and the construction of controlled environments containing any number of sensors and actuators. (Source: Wikipedia) *This dissertation is a compound document (contains both a paper copy and a CD as part of the dissertation). The CD requires the following system requirements: Adobe Acrobat; Microsoft Office; Windows MediaPlayer or RealPlayer

UNH Scholars' Repository

Getting Ready for LISA: The Data, Support and Preparation Needed to Maximize US Participation in Space-Based Gravitational Wave Science

Author: :
Bellovary Jillian
Bender Peter
Berti Emanuele
Brown Warren
Caldwell Robert
Cornish Neil
Darling Jeremy
Digman Matthew
Eracleous Mike
Gultekin Kayhan
Haiman Zoltan
Holley-Bockelmann Kelly
Holley-Bockelmann Kelly
Key Joey
Larson Shane
Liu Xin
McWilliams Sean
Natarajan Priyamvada
Robin
Shoemaker David
Shoemaker Deirdre
Smith Krista Lynne
Soares-Santos Marcelle
Stebbins
Publication venue
Publication date: 04/12/2020
Field of study

The NASA LISA Study Team was tasked to study how NASA might support US scientists to participate and maximize the science return from the Laser Interferometer Space Antenna (LISA) mission. LISA is gravitational wave observatory led by ESA with NASA as a junior partner, and is scheduled to launch in 2034. Among our findings: LISA science productivity is greatly enhanced by a full-featured US science center and an open access data model. As other major missions have demonstrated, a science center acts as both a locus and an amplifier of research innovation, data analysis, user support, user training and user interaction. In its most basic function, a US Science Center could facilitate entry into LISA science by hosting a Data Processing Center and a portal for the US community to access LISA data products. However, an enhanced LISA Science Center could: support one of the parallel independent processing pipelines required for data product validation; stimulate the high level of research on data analysis that LISA demands; support users unfamiliar with a novel observatory; facilitate astrophysics and fundamental research; provide an interface into the subtleties of the instrument to validate extraordinary discoveries; train new users; and expand the research community through guest investigator, postdoc and student programs. Establishing a US LISA Science Center well before launch can have a beneficial impact on the participation of the broader astronomical community by providing training, hosting topical workshops, disseminating mock catalogs, software pipelines, and documentation. Past experience indicates that successful science centers are established several years before launch; this early adoption model may be especially relevant for a pioneering mission like LISA.Comment: 93 pages with a lovely cover page thanks to Bernard Kelly and Elizabeth Ferrar

arXiv.org e-Print Archive

Robust Modular Feature-Based Terrain-Aided Visual Navigation and Mapping

Author: Volkova Anastasiia
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 31/08/2018
Field of study

The visual feature-based Terrain-Aided Navigation (TAN) system presented in this thesis addresses the problem of constraining inertial drift introduced into the location estimate of Unmanned Aerial Vehicles (UAVs) in GPS-denied environment. The presented TAN system utilises salient visual features representing semantic or human-interpretable objects (roads, forest and water boundaries) from onboard aerial imagery and associates them to a database of reference features created a-priori, through application of the same feature detection algorithms to satellite imagery. Correlation of the detected features with the reference features via a series of the robust data association steps allows a localisation solution to be achieved with a finite absolute bound precision defined by the certainty of the reference dataset. The feature-based Visual Navigation System (VNS) presented in this thesis was originally developed for a navigation application using simulated multi-year satellite image datasets. The extension of the system application into the mapping domain, in turn, has been based on the real (not simulated) flight data and imagery. In the mapping study the full potential of the system, being a versatile tool for enhancing the accuracy of the information derived from the aerial imagery has been demonstrated. Not only have the visual features, such as road networks, shorelines and water bodies, been used to obtain a position ’fix’, they have also been used in reverse for accurate mapping of vehicles detected on the roads into an inertial space with improved precision. Combined correction of the geo-coding errors and improved aircraft localisation formed a robust solution to the defense mapping application. A system of the proposed design will provide a complete independent navigation solution to an autonomous UAV and additionally give it object tracking capability

Sydney eScholarship

A Survey on Compiler Autotuning using Machine Learning

Author: Ashouri Amir H.
Cavazos John
Killian William
Palermo Gianluca
Silvano Cristina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2018
Field of study

Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Recommended from our members

Tackling food marketing to children in a digital world: trans-disciplinary perspectives. Children’s rights, evidence of impact, methodological challenges, regulatory options and policy implications for the WHO European Region

Author: Boyland Emma
Breda Joao
Handsley Elizabeth
Jewell Jo
Tatlow-Golden Mimi
World Health Organization Regional Office for Europe
Zalnieriute Monika
Publication venue: 'World Health Organization, Western Pacific Regional Office'
Publication date: 01/11/2016
Field of study

There is unequivocal evidence that childhood obesity is influenced by marketing of foods and non-alcoholic beverages high in saturated fat, salt and/or free sugars (HFSS), and a core recommendation of the WHO Commission on Ending Childhood Obesity is to reduce children’s exposure to all such marketing. As a result, WHO has called on Member States to introduce restrictions on marketing of HFSS foods to children, covering all media, including digital, and to close any regulatory loopholes. This publication provides up-to-date information on the marketing of foods and non-alcoholic beverages to children and the changes that have occurred in recent years, focusing in particular on the major shift to digital marketing. It examines trends in media use among children, marketing methods in the new digital media landscape and children’s engagement with such marketing. It also considers the impact on children and their ability to counter marketing as well as the implications for children’s rights and digital privacy. Finally the report discusses the policy implications and some of the recent policy action by WHO European Member States

Open Research Online (The Open University)

A compiler level intermediate representation based binary analysis system and its applications

Author: Anand Kapil
Publication venue
Publication date: 01/01/2013
Field of study

Analyzing and optimizing programs from their executables has received a lot of attention recently in the research community. There has been a tremendous amount of activity in executable-level research targeting varied applications such as security vulnerability analysis, untrusted code analysis, malware analysis, program testing, and binary optimizations. The vision of this dissertation is to advance the field of static analysis of executables and bridge the gap between source-level analysis and executable analysis. The main thesis of this work is scalable static binary rewriting and analysis using compiler-level intermediate representation without relying on the presence of metadata information such as debug or symbolic information. In spite of a significant overlap in the overall goals of several source-code methods and executables-level techniques, several sophisticated transformations that are well-understood and implemented in source-level infrastructures have yet to become available in executable frameworks. It is a well known fact that a standalone executable without any meta data is less amenable to analysis than the source code. Nonetheless, we believe that one of the prime reasons behind the limitations of existing executable frameworks is that current executable frameworks define their own intermediate representations (IR) which are significantly more constrained than an IR used in a compiler. Intermediate representations used in existing binary frameworks lack high level features like abstract stack, variables, and symbols and are even machine dependent in some cases. This severely limits the application of well-understood compiler transformations to executables and necessitates new research to make them applicable. In the first part of this dissertation, we present techniques to convert the binaries to the same high-level intermediate representation that compilers use. We propose methods to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We also present methods to convert the physically addressed stack in an executable to an abstract stack. The proposed methods are practical since they do not employ symbolic, relocation, or debug information which are usually absent in deployed executables. We have integrated our techniques with a prototype x86 binary framework called \emph{SecondWrite} that uses LLVM as the IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, including several real world programs. In the next part of this work, we demonstrate that several well-known source-level analysis frameworks such as symbolic analysis have limited effectiveness in the executable domain since executables typically lack higher-level semantics such as program variables. The IR should have a precise memory abstraction for an analysis to effectively reason about memory operations. Our first work of recovering a compiler-level representation addresses this limitation by recovering several higher-level semantics information from executables. In the next part of this work, we propose methods to handle the scenarios when such semantics cannot be recovered. First, we propose a hybrid static-dynamic mechanism for recovering a precise and correct memory model in executables in presence of executable-specific artifacts such as indirect control transfers. Next, the enhanced memory model is employed to define a novel symbolic analysis framework for executables that can perform the same types of program analysis as source-level tools. Frameworks hitherto fail to simultaneously maintain the properties of correct representation and precise memory model and ignore memory-allocated variables while defining symbolic analysis mechanisms. We exemplify that our framework is robust, efficient and it significantly improves the performance of various traditional analyses like global value numbering, alias analysis and dependence analysis for executables. Finally, the underlying representation and analysis framework is employed for two separate applications. First, the framework is extended to define a novel static analysis framework, \emph{DemandFlow}, for identifying information flow security violations in program executables. Unlike existing static vulnerability detection methods for executables, DemandFlow analyzes memory locations in addition to symbols, thus improving the precision of the analysis. DemandFlow proposes a novel demand-driven mechanism to identify and precisely analyze only those program locations and memory accesses which are relevant to a vulnerability, thus enhancing scalability. DemandFlow uncovers six previously undiscovered format string and directory traversal vulnerabilities in popular ftp and internet relay chat clients. Next, the framework is extended to implement a platform-specific optimization for embedded processors. Several embedded systems provide the facility of locking one or more lines in the cache. We devise the first method in literature that employs instruction cache locking as a mechanism for improving the average-case run-time of general embedded applications. We demonstrate that the optimal solution for instruction cache locking can be obtained in polynomial time. Since our scheme is implemented inside a binary framework, it successfully addresses the portability concern by enabling the implementation of cache locking at the time of deployment when all the details of the memory hierarchy are available

CiteSeerX

Digital Repository at the University of Maryland