845 research outputs found
WIYN Open Cluster Study. XXXIX. Abundances in NGC 6253 from HYDRA Spectroscopy of the Li 6708 A Region
High-dispersion spectra of 89 potential members of the old, super-metal-rich
open cluster, NGC 6253, have been obtained with the HYDRA multi-object
spectrograph. Based upon radial-velocity measurements alone, 47 stars at the
turnoff of the cluster color-magnitude diagram (CMD) and 18 giants are
identified as potential members. Five turnoff stars exhibit evidence of
binarity while proper-motion data eliminates two of the dwarfs as members. The
mean cluster radial velocity from probable single-star members is -29.4 +/- 1.3
km/sec (sd). A discussion of the current estimates for the cluster reddening,
derived independently of potential issues with the BV cluster photometry, lead
to an adopted reddening of E(B-V) = 0.22 +/- 0.04. From equivalent width
analyses of 38 probable single-star members near the CMD turnoff, the weighted
average abundances are found to be [Fe/H] = +0.43 +/- 0.01, [Ni/H] = +0.53 +/-
0.02 and [Si/H] = +0.43 (+0.03,-0.04), where the errors refer to the standard
errors of the weighted mean. Weak evidence is found for a possible decline in
metallicity with increasing luminosity among stars at the turnoff. We discuss
the possibility that our turnoff stars have been affected by microscopic
diffusion. For 15 probable single-star members among the giants, spectrum
synthesis leads to abundances of +0.46 (+0.02,-0.03) for [Fe/H]. While less
than half the age of NGC 6791, NGC 6253 is at least as metal-rich and, within
the uncertainties, exhibits the same general abundance pattern as that typified
by super-metal-rich dwarfs of the galactic bulge.Comment: 5 Tables, 9 figures, 45 page
A fast and scalable binary similarity method for open source libraries
Abstract. Usage of third party open source software has become more and more popular in the past years, due to the need for faster development cycles and the availability of good quality libraries. Those libraries are integrated as dependencies and often in the form of binary artifacts. This is especially common in embedded software applications. Dependencies, however, can proliferate and also add new attack surfaces to an application due to vulnerabilities in the library code. Hence, the need for binary similarity analysis methods to detect libraries compiled into applications.
Binary similarity detection methods are related to text similarity methods and build upon the research in that area. In this research we focus on fuzzy matching methods, that have been used widely and successfully in text similarity analysis. In particular, we propose using locality sensitive hashing schemes in combination with normalised binary code features. The normalization allows us to apply the similarity comparison across binaries produced by different compilers using different optimization flags and being build for various machine architectures.
To improve the matching precision, we use weighted code features. Machine learning is used to optimize the feature weights to create clusters of semantically similar code blocks extracted from different binaries. The machine learning is performed in an offline process to increase scalability and performance of the matching system.
Using above methods we build a database of binary similarity code signatures for open source libraries. The database is utilized to match by similarity any code blocks from an application to known libraries in the database. One of the goals of our system is to facilitate a fast and scalable similarity matching process. This allows integrating the system into continuous software development, testing and integration pipelines.
The evaluation shows that our results are comparable to other systems proposed in related research in terms of precision while maintaining the performance required in continuous integration systems.Nopea ja skaalautuva käännettyjen ohjelmistojen samankaltaisuuden tunnistusmenetelmä avoimen lähdekoodin kirjastoille. Tiivistelmä. Kolmansien osapuolten kehittämien ohjelmistojen käyttö on yleistynyt valtavasti viime vuosien aikana nopeutuvan ohjelmistokehityksen ja laadukkaiden ohjelmistokirjastojen tarjonnan kasvun myötä. Nämä kirjastot ovat yleensä lisätty kehitettävään ohjelmistoon riippuvuuksina ja usein jopa käännettyinä binääreinä. Tämä on yleistä varsinkin sulatetuissa ohjelmistoissa. Riippuvuudet saattavat kuitenkin luoda uusia hyökkäysvektoreita kirjastoista löytyvien haavoittuvuuksien johdosta. Nämä kolmansien osapuolten kirjastoista löytyvät haavoittuvuudet synnyttävät tarpeen tunnistaa käännetyistä binääriohjelmistoista löytyvät avoimen lähdekoodin ohjelmistokirjastot.
Binäärien samankaltaisuuden tunnistusmenetelmät usein pohjautuvat tekstin samankaltaisuuden tunnistusmenetelmiin ja hyödyntävät tämän tieteellisiä saavutuksia. Tässä tutkimuksessa keskitytään sumeisiin tunnistusmenetelmiin, joita on käytetty laajasti tekstin samankaltaisuuden tunnistamisessa. Tutkimuksessa hyödynnetään sijainnille sensitiivisiä tiivistemenetelmiä ja normalisoituja binäärien ominaisuuksia. Ominaisuuksien normalisoinnin avulla binäärien samankaltaisuutta voidaan vertailla ohjelmiston kääntämisessä käytetystä kääntäjästä, optimisaatiotasoista ja prosessoriarkkitehtuurista huolimatta.
Menetelmän tarkkuutta parannetaan painotettujen binääriominaisuuksien avulla. Koneoppimista hyödyntämällä binääriomisaisuuksien painotus optimoidaan siten, että samankaltaisista binääreistä puretut ohjelmistoblokit luovat samankaltaisien ohjelmistojen joukkoja. Koneoppiminen suoritetaan erillisessä prosessissa, mikä parantaa järjestelmän suorituskykyä.
Näiden menetelmien avulla luodaan tietokanta avoimen lähdekoodin kirjastojen tunnisteista. Tietokannan avulla minkä tahansa ohjelmiston samankaltaiset binääriblokit voidaan yhdistää tunnettuihin avoimen lähdekoodin kirjastoihin. Menetelmän tavoitteena on tarjota nopea ja skaalautuva samankaltaisuuden tunnistus. Näiden ominaisuuksien johdosta järjestelmä voidaan liittää osaksi ohjelmistokehitys-, integraatioprosesseja ja ohjelmistotestausta.
Vertailu muihin kirjallisuudessa esiteltyihin menetelmiin osoittaa, että esitellyn menetlmän tulokset on vertailtavissa muihin kirjallisuudessa esiteltyihin menetelmiin tarkkuuden osalta. Menetelmä myös ylläpitää suorituskyvyn, jota vaaditaan jatkuvan integraation järjestelmissä
The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files
In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated
BinComp: A Stratified Approach to Compiler Provenance Attribution
Compiler provenance encompasses numerous pieces of information, such as the compiler family, compiler version, optimization level, and compiler-related functions. The extraction of such information is imperative for various binary analysis applications, such as function fingerprinting, clone detection, and authorship attribution. It is thus important to develop an efficient and automated approach for extracting compiler provenance. In this study, we present BinComp, a practical approach which, analyzes the syntax, structure, and semantics of disassembled functions to extract compiler provenance. BinComp has a stratified architecture with three layers. The first layer applies a supervised compilation process to a set of known programs to model the default code transformation of compilers. The second layer employs an intersection process that disassembles functions across compiled binaries to extract statistical features (e.g., numerical values) from common compiler/linker-inserted functions. This layer labels the compiler-related functions. The third layer extracts semantic features from the labeled compiler-related functions to identify the compiler version and the optimization level. Our experimental results demonstrate that BinComp is efficient in terms of both computational resources and time
Program Similarity Analysis for Malware Classification and its Pitfalls
Malware classification, specifically the task of grouping malware samples into families according to their behaviour, is vital in order to understand the threat they pose and how to protect against them. Recognizing whether one program shares behaviors with another is a task that requires semantic reasoning, meaning that it needs to consider what a program actually does. This is a famously uncomputable problem, due to Rice\u2019s theorem. As there is no one-size-fits-all solution, determining program similarity in the context of malware classification requires different tools and methods depending on what is available to the malware defender. When the malware source code is readily available (or at least, easy to retrieve), most approaches employ semantic \u201cabstractions\u201d, which are computable approximations of the semantics of the program. We consider this the first scenario for this thesis: malware classification using semantic abstractions extracted from the source code in an open system. Structural features, such as the control flow graphs of programs, can be used to classify malware reasonably well. To demonstrate this, we build a tool for malware analysis, R.E.H.A. which targets the Android system and leverages its openness to extract a structural feature from the source code of malware samples. This tool is first successfully evaluated against a state of the art malware dataset and then on a newly collected dataset. We show that R.E.H.A. is able to classify the new samples into their respective families, often outperforming commercial antivirus software. However, abstractions have limitations by virtue of being approximations. We show that by increasing the granularity of the abstractions used to produce more fine-grained features, we can improve the accuracy of the results as in our second tool, StranDroid, which generates fewer false positives on the same datasets. The source code of malware samples is not often available or easily retrievable. For this reason, we introduce a second scenario in which the classification must be carried out with only the compiled binaries of malware samples on hand. Program similarity in this context cannot be done using semantic abstractions as before, since it is difficult to create meaningful abstractions from zeros and ones. Instead, by treating the compiled programs as raw data, we transform them into images and build upon common image classification algorithms using machine learning. This led us to develop novel deep learning models, a convolutional neural network and a long short-term memory, to classify the samples into their respective families. To overcome the usual obstacle of deep learning of lacking sufficiently large and balanced datasets, we utilize obfuscations as a data augmentation tool to generate semantically equivalent variants of existing samples and expand the dataset as needed. Finally, to lower the computational cost of the training process, we use transfer learning and show that a model trained on one dataset can be used to successfully classify samples in different malware datasets. The third scenario explored in this thesis assumes that even the binary itself cannot be accessed for analysis, but it can be executed, and the execution traces can then be used to extract semantic properties. However, dynamic analysis lacks the formal tools and frameworks that exist in static analysis to allow proving the effectiveness of obfuscations. For this reason, the focus shifts to building a novel formal framework that is able to assess the potency of obfuscations against dynamic analysis. We validate the new framework by using it to encode known analyses and obfuscations, and show how these obfuscations actually hinder the dynamic analysis process
Binaries in the Kuiper Belt
Binaries have played a crucial role many times in the history of modern
astronomy and are doing so again in the rapidly evolving exploration of the
Kuiper Belt. The large fraction of transneptunian objects that are binary or
multiple, 48 such systems are now known, has been an unanticipated windfall.
Separations and relative magnitudes measured in discovery images give important
information on the statistical properties of the binary population that can be
related to competing models of binary formation. Orbits, derived for 13
systems, provide a determination of the system mass. Masses can be used to
derive densities and albedos when an independent size measurement is available.
Angular momenta and relative sizes of the majority of binaries are consistent
with formation by dynamical capture. The small satellites of the largest
transneptunian objects, in contrast, are more likely formed from collisions.
Correlations of the fraction of binaries with different dynamical populations
or with other physical variables have the potential to constrain models of the
origin and evolution of the transneptunian population as a whole. Other means
of studying binaries have only begun to be exploited, including lightcurve,
color, and spectral data. Because of the several channels for obtaining unique
physical information, it is already clear that binaries will emerge as one of
the most useful tools for unraveling the many complexities of transneptunian
space.Comment: Accepted for inclusion in "The Kuiper Belt", University of Arizona
Press, Space Science Series Corrected references in Table
A Tree Locality-Sensitive Hash for Secure Software Testing
Bugs in software that make it through testing can cost tens of millions of dollars each year, and in some cases can even result in the loss of human life. In order to eliminate bugs, developers may use symbolic execution to search through possible program states looking for anomalous states. Most of the computational effort to search through these states is spent solving path constraints in order to determine the feasibility of entering each state. State merging can make this search more efficient by combining program states, allowing multiple execution paths to be analyzed at the same time. However, a merge with dissimilar path constraints dramatically increases the time necessary to solve the path constraint. Currently, there are no distance measures for path constraints, and pairwise comparison of program states is not scalable. A hashing method is presented that clusters constraints in such a way that similar constraints are placed in the same cluster without requiring pairwise comparisons between queries. When combined with other state-of-the-art state merging techniques, the hashing method allows the symbolic executor to execute more instructions per second and find more terminal execution states than the other techniques alone, without decreasing the high path coverage achieved by merging many states together
Recommended from our members
Honeypots in the age of universal attacks and the Internet of Things
Today's Internet connects billions of physical devices. These devices are often immature and insecure, and share common vulnerabilities. The predominant form of attacks relies on recent advances in Internet-wide scanning and device discovery. The speed at which (vulnerable) devices can be discovered, and the device monoculture, mean that a single exploit, potentially trivial, can affect millions of devices across brands and continents.
In an attempt to detect and profile the growing threat of autonomous and Internet-scale attacks against the Internet of Things, we revisit honeypots, resources that appear to be legitimate systems. We show that this endeavour was previously limited by a fundamentally flawed generation of honeypots and associated misconceptions.
We show with two one-year-long studies that the display of warning messages has no deterrent effect in an attacked computer system. Previous research assumed that they would measure individual behaviour, but we find that the number of human attackers is orders of magnitude lower than previously assumed.
Turning to the current generation of low- and medium-interaction honeypots, we demonstrate that their architecture is fatally flawed. The use of off-the-shelf libraries to provide the transport layer means that the protocols are implemented subtly differently from the systems being impersonated. We developed a generic technique which can find any such honeypot at Internet scale with just one packet for an established TCP connection.
We then applied our technique and conducted several Internet-wide scans over a one-year period. By logging in to two SSH honeypots and sending specific commands, we not only revealed their configuration and patch status, but also found that many of them were not up to date. As we were the first to knowingly authenticate to honeypots, we provide a detailed legal analysis and an extended ethical justification for our research to show why we did not infringe computer-misuse laws.
Lastly, we present honware, a honeypot framework for rapid implementation and deployment of high-interaction honeypots. Honware automatically processes a standard firmware image and can emulate a wide range of devices without any access to the manufacturers' hardware. We believe that honware is a major contribution towards re-balancing the economics of attackers and defenders by reducing the period in which attackers can exploit vulnerabilities at Internet scale in a world of ubiquitous networked `things'.Premium Research Studentship, Department of Computer Science and Technology, University of Cambridg
eROSITA Science Book: Mapping the Structure of the Energetic Universe
eROSITA is the primary instrument on the Russian SRG mission. In the first
four years of scientific operation after its launch, foreseen for 2014, it will
perform a deep survey of the entire X-ray sky. In the soft X-ray band (0.5-2
keV), this will be about 20 times more sensitive than the ROSAT all sky survey,
while in the hard band (2-10 keV) it will provide the first ever true imaging
survey of the sky at those energies. Such a sensitive all-sky survey will
revolutionize our view of the high-energy sky, and calls for major efforts in
synergic, multi-wavelength wide area surveys in order to fully exploit the
scientific potential of the X-ray data. The design-driving science of eROSITA
is the detection of very large samples (~10^5 objects) of galaxy clusters out
to redshifts z>1, in order to study the large scale structure in the Universe,
test and characterize cosmological models including Dark Energy. eROSITA is
also expected to yield a sample of around 3 millions Active Galactic Nuclei,
including both obscured and un-obscured objects, providing a unique view of the
evolution of supermassive black holes within the emerging cosmic structure. The
survey will also provide new insights into a wide range of astrophysical
phenomena, including accreting binaries, active stars and diffuse emission
within the Galaxy, as well as solar system bodies that emit X-rays via the
charge exchange process. Finally, such a deep imaging survey at high spectral
resolution, with its scanning strategy sensitive to a range of variability
timescales from tens of seconds to years, will undoubtedly open up a vast
discovery space for the study of rare, unpredicted, or unpredictable
high-energy astrophysical phenomena. In this living document we present a
comprehensive description of the main scientific goals of the mission, with
strong emphasis on the early survey phases.Comment: 84 Pages, 52 Figures. Published online as MPE document. Edited by S.
Allen. G. Hasinger and K. Nandra. Few minor corrections (typos) and updated
reference
- …