845 research outputs found

    WIYN Open Cluster Study. XXXIX. Abundances in NGC 6253 from HYDRA Spectroscopy of the Li 6708 A Region

    Full text link
    High-dispersion spectra of 89 potential members of the old, super-metal-rich open cluster, NGC 6253, have been obtained with the HYDRA multi-object spectrograph. Based upon radial-velocity measurements alone, 47 stars at the turnoff of the cluster color-magnitude diagram (CMD) and 18 giants are identified as potential members. Five turnoff stars exhibit evidence of binarity while proper-motion data eliminates two of the dwarfs as members. The mean cluster radial velocity from probable single-star members is -29.4 +/- 1.3 km/sec (sd). A discussion of the current estimates for the cluster reddening, derived independently of potential issues with the BV cluster photometry, lead to an adopted reddening of E(B-V) = 0.22 +/- 0.04. From equivalent width analyses of 38 probable single-star members near the CMD turnoff, the weighted average abundances are found to be [Fe/H] = +0.43 +/- 0.01, [Ni/H] = +0.53 +/- 0.02 and [Si/H] = +0.43 (+0.03,-0.04), where the errors refer to the standard errors of the weighted mean. Weak evidence is found for a possible decline in metallicity with increasing luminosity among stars at the turnoff. We discuss the possibility that our turnoff stars have been affected by microscopic diffusion. For 15 probable single-star members among the giants, spectrum synthesis leads to abundances of +0.46 (+0.02,-0.03) for [Fe/H]. While less than half the age of NGC 6791, NGC 6253 is at least as metal-rich and, within the uncertainties, exhibits the same general abundance pattern as that typified by super-metal-rich dwarfs of the galactic bulge.Comment: 5 Tables, 9 figures, 45 page

    A fast and scalable binary similarity method for open source libraries

    Get PDF
    Abstract. Usage of third party open source software has become more and more popular in the past years, due to the need for faster development cycles and the availability of good quality libraries. Those libraries are integrated as dependencies and often in the form of binary artifacts. This is especially common in embedded software applications. Dependencies, however, can proliferate and also add new attack surfaces to an application due to vulnerabilities in the library code. Hence, the need for binary similarity analysis methods to detect libraries compiled into applications. Binary similarity detection methods are related to text similarity methods and build upon the research in that area. In this research we focus on fuzzy matching methods, that have been used widely and successfully in text similarity analysis. In particular, we propose using locality sensitive hashing schemes in combination with normalised binary code features. The normalization allows us to apply the similarity comparison across binaries produced by different compilers using different optimization flags and being build for various machine architectures. To improve the matching precision, we use weighted code features. Machine learning is used to optimize the feature weights to create clusters of semantically similar code blocks extracted from different binaries. The machine learning is performed in an offline process to increase scalability and performance of the matching system. Using above methods we build a database of binary similarity code signatures for open source libraries. The database is utilized to match by similarity any code blocks from an application to known libraries in the database. One of the goals of our system is to facilitate a fast and scalable similarity matching process. This allows integrating the system into continuous software development, testing and integration pipelines. The evaluation shows that our results are comparable to other systems proposed in related research in terms of precision while maintaining the performance required in continuous integration systems.Nopea ja skaalautuva käännettyjen ohjelmistojen samankaltaisuuden tunnistusmenetelmä avoimen lähdekoodin kirjastoille. Tiivistelmä. Kolmansien osapuolten kehittämien ohjelmistojen käyttö on yleistynyt valtavasti viime vuosien aikana nopeutuvan ohjelmistokehityksen ja laadukkaiden ohjelmistokirjastojen tarjonnan kasvun myötä. Nämä kirjastot ovat yleensä lisätty kehitettävään ohjelmistoon riippuvuuksina ja usein jopa käännettyinä binääreinä. Tämä on yleistä varsinkin sulatetuissa ohjelmistoissa. Riippuvuudet saattavat kuitenkin luoda uusia hyökkäysvektoreita kirjastoista löytyvien haavoittuvuuksien johdosta. Nämä kolmansien osapuolten kirjastoista löytyvät haavoittuvuudet synnyttävät tarpeen tunnistaa käännetyistä binääriohjelmistoista löytyvät avoimen lähdekoodin ohjelmistokirjastot. Binäärien samankaltaisuuden tunnistusmenetelmät usein pohjautuvat tekstin samankaltaisuuden tunnistusmenetelmiin ja hyödyntävät tämän tieteellisiä saavutuksia. Tässä tutkimuksessa keskitytään sumeisiin tunnistusmenetelmiin, joita on käytetty laajasti tekstin samankaltaisuuden tunnistamisessa. Tutkimuksessa hyödynnetään sijainnille sensitiivisiä tiivistemenetelmiä ja normalisoituja binäärien ominaisuuksia. Ominaisuuksien normalisoinnin avulla binäärien samankaltaisuutta voidaan vertailla ohjelmiston kääntämisessä käytetystä kääntäjästä, optimisaatiotasoista ja prosessoriarkkitehtuurista huolimatta. Menetelmän tarkkuutta parannetaan painotettujen binääriominaisuuksien avulla. Koneoppimista hyödyntämällä binääriomisaisuuksien painotus optimoidaan siten, että samankaltaisista binääreistä puretut ohjelmistoblokit luovat samankaltaisien ohjelmistojen joukkoja. Koneoppiminen suoritetaan erillisessä prosessissa, mikä parantaa järjestelmän suorituskykyä. Näiden menetelmien avulla luodaan tietokanta avoimen lähdekoodin kirjastojen tunnisteista. Tietokannan avulla minkä tahansa ohjelmiston samankaltaiset binääriblokit voidaan yhdistää tunnettuihin avoimen lähdekoodin kirjastoihin. Menetelmän tavoitteena on tarjota nopea ja skaalautuva samankaltaisuuden tunnistus. Näiden ominaisuuksien johdosta järjestelmä voidaan liittää osaksi ohjelmistokehitys-, integraatioprosesseja ja ohjelmistotestausta. Vertailu muihin kirjallisuudessa esiteltyihin menetelmiin osoittaa, että esitellyn menetlmän tulokset on vertailtavissa muihin kirjallisuudessa esiteltyihin menetelmiin tarkkuuden osalta. Menetelmä myös ylläpitää suorituskyvyn, jota vaaditaan jatkuvan integraation järjestelmissä

    The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files

    Get PDF
    In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated

    BinComp: A Stratified Approach to Compiler Provenance Attribution

    Get PDF
    Compiler provenance encompasses numerous pieces of information, such as the compiler family, compiler version, optimization level, and compiler-related functions. The extraction of such information is imperative for various binary analysis applications, such as function fingerprinting, clone detection, and authorship attribution. It is thus important to develop an efficient and automated approach for extracting compiler provenance. In this study, we present BinComp, a practical approach which, analyzes the syntax, structure, and semantics of disassembled functions to extract compiler provenance. BinComp has a stratified architecture with three layers. The first layer applies a supervised compilation process to a set of known programs to model the default code transformation of compilers. The second layer employs an intersection process that disassembles functions across compiled binaries to extract statistical features (e.g., numerical values) from common compiler/linker-inserted functions. This layer labels the compiler-related functions. The third layer extracts semantic features from the labeled compiler-related functions to identify the compiler version and the optimization level. Our experimental results demonstrate that BinComp is efficient in terms of both computational resources and time

    Program Similarity Analysis for Malware Classification and its Pitfalls

    Get PDF
    Malware classification, specifically the task of grouping malware samples into families according to their behaviour, is vital in order to understand the threat they pose and how to protect against them. Recognizing whether one program shares behaviors with another is a task that requires semantic reasoning, meaning that it needs to consider what a program actually does. This is a famously uncomputable problem, due to Rice\u2019s theorem. As there is no one-size-fits-all solution, determining program similarity in the context of malware classification requires different tools and methods depending on what is available to the malware defender. When the malware source code is readily available (or at least, easy to retrieve), most approaches employ semantic \u201cabstractions\u201d, which are computable approximations of the semantics of the program. We consider this the first scenario for this thesis: malware classification using semantic abstractions extracted from the source code in an open system. Structural features, such as the control flow graphs of programs, can be used to classify malware reasonably well. To demonstrate this, we build a tool for malware analysis, R.E.H.A. which targets the Android system and leverages its openness to extract a structural feature from the source code of malware samples. This tool is first successfully evaluated against a state of the art malware dataset and then on a newly collected dataset. We show that R.E.H.A. is able to classify the new samples into their respective families, often outperforming commercial antivirus software. However, abstractions have limitations by virtue of being approximations. We show that by increasing the granularity of the abstractions used to produce more fine-grained features, we can improve the accuracy of the results as in our second tool, StranDroid, which generates fewer false positives on the same datasets. The source code of malware samples is not often available or easily retrievable. For this reason, we introduce a second scenario in which the classification must be carried out with only the compiled binaries of malware samples on hand. Program similarity in this context cannot be done using semantic abstractions as before, since it is difficult to create meaningful abstractions from zeros and ones. Instead, by treating the compiled programs as raw data, we transform them into images and build upon common image classification algorithms using machine learning. This led us to develop novel deep learning models, a convolutional neural network and a long short-term memory, to classify the samples into their respective families. To overcome the usual obstacle of deep learning of lacking sufficiently large and balanced datasets, we utilize obfuscations as a data augmentation tool to generate semantically equivalent variants of existing samples and expand the dataset as needed. Finally, to lower the computational cost of the training process, we use transfer learning and show that a model trained on one dataset can be used to successfully classify samples in different malware datasets. The third scenario explored in this thesis assumes that even the binary itself cannot be accessed for analysis, but it can be executed, and the execution traces can then be used to extract semantic properties. However, dynamic analysis lacks the formal tools and frameworks that exist in static analysis to allow proving the effectiveness of obfuscations. For this reason, the focus shifts to building a novel formal framework that is able to assess the potency of obfuscations against dynamic analysis. We validate the new framework by using it to encode known analyses and obfuscations, and show how these obfuscations actually hinder the dynamic analysis process

    Binaries in the Kuiper Belt

    Get PDF
    Binaries have played a crucial role many times in the history of modern astronomy and are doing so again in the rapidly evolving exploration of the Kuiper Belt. The large fraction of transneptunian objects that are binary or multiple, 48 such systems are now known, has been an unanticipated windfall. Separations and relative magnitudes measured in discovery images give important information on the statistical properties of the binary population that can be related to competing models of binary formation. Orbits, derived for 13 systems, provide a determination of the system mass. Masses can be used to derive densities and albedos when an independent size measurement is available. Angular momenta and relative sizes of the majority of binaries are consistent with formation by dynamical capture. The small satellites of the largest transneptunian objects, in contrast, are more likely formed from collisions. Correlations of the fraction of binaries with different dynamical populations or with other physical variables have the potential to constrain models of the origin and evolution of the transneptunian population as a whole. Other means of studying binaries have only begun to be exploited, including lightcurve, color, and spectral data. Because of the several channels for obtaining unique physical information, it is already clear that binaries will emerge as one of the most useful tools for unraveling the many complexities of transneptunian space.Comment: Accepted for inclusion in "The Kuiper Belt", University of Arizona Press, Space Science Series Corrected references in Table

    A Tree Locality-Sensitive Hash for Secure Software Testing

    Get PDF
    Bugs in software that make it through testing can cost tens of millions of dollars each year, and in some cases can even result in the loss of human life. In order to eliminate bugs, developers may use symbolic execution to search through possible program states looking for anomalous states. Most of the computational effort to search through these states is spent solving path constraints in order to determine the feasibility of entering each state. State merging can make this search more efficient by combining program states, allowing multiple execution paths to be analyzed at the same time. However, a merge with dissimilar path constraints dramatically increases the time necessary to solve the path constraint. Currently, there are no distance measures for path constraints, and pairwise comparison of program states is not scalable. A hashing method is presented that clusters constraints in such a way that similar constraints are placed in the same cluster without requiring pairwise comparisons between queries. When combined with other state-of-the-art state merging techniques, the hashing method allows the symbolic executor to execute more instructions per second and find more terminal execution states than the other techniques alone, without decreasing the high path coverage achieved by merging many states together

    eROSITA Science Book: Mapping the Structure of the Energetic Universe

    Full text link
    eROSITA is the primary instrument on the Russian SRG mission. In the first four years of scientific operation after its launch, foreseen for 2014, it will perform a deep survey of the entire X-ray sky. In the soft X-ray band (0.5-2 keV), this will be about 20 times more sensitive than the ROSAT all sky survey, while in the hard band (2-10 keV) it will provide the first ever true imaging survey of the sky at those energies. Such a sensitive all-sky survey will revolutionize our view of the high-energy sky, and calls for major efforts in synergic, multi-wavelength wide area surveys in order to fully exploit the scientific potential of the X-ray data. The design-driving science of eROSITA is the detection of very large samples (~10^5 objects) of galaxy clusters out to redshifts z>1, in order to study the large scale structure in the Universe, test and characterize cosmological models including Dark Energy. eROSITA is also expected to yield a sample of around 3 millions Active Galactic Nuclei, including both obscured and un-obscured objects, providing a unique view of the evolution of supermassive black holes within the emerging cosmic structure. The survey will also provide new insights into a wide range of astrophysical phenomena, including accreting binaries, active stars and diffuse emission within the Galaxy, as well as solar system bodies that emit X-rays via the charge exchange process. Finally, such a deep imaging survey at high spectral resolution, with its scanning strategy sensitive to a range of variability timescales from tens of seconds to years, will undoubtedly open up a vast discovery space for the study of rare, unpredicted, or unpredictable high-energy astrophysical phenomena. In this living document we present a comprehensive description of the main scientific goals of the mission, with strong emphasis on the early survey phases.Comment: 84 Pages, 52 Figures. Published online as MPE document. Edited by S. Allen. G. Hasinger and K. Nandra. Few minor corrections (typos) and updated reference
    corecore