Search CORE

1,012 research outputs found

THE SCALABLE AND ACCOUNTABLE BINARY CODE SEARCH AND ITS APPLICATIONS

Author: Feng Qian
Publication venue: SURFACE at Syracuse University
Publication date: 30/06/2017
Field of study

The past decade has been witnessing an explosion of various applications and devices. This big-data era challenges the existing security technologies: new analysis techniques should be scalable to handle “big data” scale codebase; They should be become smart and proactive by using the data to understand what the vulnerable points are and where they locate; effective protection will be provided for dissemination and analysis of the data involving sensitive information on an unprecedented scale. In this dissertation, I argue that the code search techniques can boost existing security analysis techniques (vulnerability identification and memory analysis) in terms of scalability and accuracy. In order to demonstrate its benefits, I address two issues of code search by using the code analysis: scalability and accountability. I further demonstrate the benefit of code search by applying it for the scalable vulnerability identification [57] and the cross-version memory analysis problems [55, 56]. Firstly, I address the scalability problem of code search by learning “higher-level” semantic features from code [57]. Instead of conducting fine-grained testing on a single device or program, it becomes much more crucial to achieve the quick vulnerability scanning in devices or programs at a “big data” scale. However, discovering vulnerabilities in “big code” is like finding a needle in the haystack, even when dealing with known vulnerabilities. This new challenge demands a scalable code search approach. To this end, I leverage successful techniques from the image search in computer vision community and propose a novel code encoding method for scalable vulnerability search in binary code. The evaluation results show that this approach can achieve comparable or even better accuracy and efficiency than the baseline techniques. Secondly, I tackle the accountability issues left in the vulnerability searching problem by designing vulnerability-oriented raw features [58]. The similar code does not always represent the similar vulnerability, so it requires that the feature engineering for the code search should focus on semantic level features rather than syntactic ones. I propose to extract conditional formulas as higher-level semantic features from the raw binary code to conduct the code search. A conditional formula explicitly captures two cardinal factors of a vulnerability: 1) erroneous data dependencies and 2) missing or invalid condition checks. As a result, the binary code search on conditional formulas produces significantly higher accuracy and provides meaningful evidence for human analysts to further examine the search results. The evaluation results show that this approach can further improve the search accuracy of existing bug search techniques with very reasonable performance overhead. Finally, I demonstrate the potential of the code search technique in the memory analysis field, and apply it to address their across-version issue in the memory forensic problem [55, 56]. The memory analysis techniques for COTS software usually rely on the so-called “data structure profiles” for their binaries. Construction of such profiles requires the expert knowledge about the internal working of a specified software version. However, it is still a cumbersome manual effort most of time. I propose to leverage the code search technique to enable a notion named “cross-version memory analysis”, which can update a profile for new versions of a software by transferring the knowledge from the model that has already been trained on its old version. The evaluation results show that the code search based approach advances the existing memory analysis methods by reducing the manual efforts while maintaining the reasonable accuracy. With the help of collaborators, I further developed two plugins to the Volatility memory forensic framework [2], and show that each of the two plugins can construct a localized profile to perform specified memory forensic tasks on the same memory dump, without the need of manual effort in creating the corresponding profile

Syracuse University Research Facility and Collaborative Environment

A fast and scalable binary similarity method for open source libraries

Author: Sipola J. (Juho)
Publication venue: University of Oulu
Publication date: 16/02/2023
Field of study

Abstract. Usage of third party open source software has become more and more popular in the past years, due to the need for faster development cycles and the availability of good quality libraries. Those libraries are integrated as dependencies and often in the form of binary artifacts. This is especially common in embedded software applications. Dependencies, however, can proliferate and also add new attack surfaces to an application due to vulnerabilities in the library code. Hence, the need for binary similarity analysis methods to detect libraries compiled into applications. Binary similarity detection methods are related to text similarity methods and build upon the research in that area. In this research we focus on fuzzy matching methods, that have been used widely and successfully in text similarity analysis. In particular, we propose using locality sensitive hashing schemes in combination with normalised binary code features. The normalization allows us to apply the similarity comparison across binaries produced by different compilers using different optimization flags and being build for various machine architectures. To improve the matching precision, we use weighted code features. Machine learning is used to optimize the feature weights to create clusters of semantically similar code blocks extracted from different binaries. The machine learning is performed in an offline process to increase scalability and performance of the matching system. Using above methods we build a database of binary similarity code signatures for open source libraries. The database is utilized to match by similarity any code blocks from an application to known libraries in the database. One of the goals of our system is to facilitate a fast and scalable similarity matching process. This allows integrating the system into continuous software development, testing and integration pipelines. The evaluation shows that our results are comparable to other systems proposed in related research in terms of precision while maintaining the performance required in continuous integration systems.Nopea ja skaalautuva käännettyjen ohjelmistojen samankaltaisuuden tunnistusmenetelmä avoimen lähdekoodin kirjastoille. Tiivistelmä. Kolmansien osapuolten kehittämien ohjelmistojen käyttö on yleistynyt valtavasti viime vuosien aikana nopeutuvan ohjelmistokehityksen ja laadukkaiden ohjelmistokirjastojen tarjonnan kasvun myötä. Nämä kirjastot ovat yleensä lisätty kehitettävään ohjelmistoon riippuvuuksina ja usein jopa käännettyinä binääreinä. Tämä on yleistä varsinkin sulatetuissa ohjelmistoissa. Riippuvuudet saattavat kuitenkin luoda uusia hyökkäysvektoreita kirjastoista löytyvien haavoittuvuuksien johdosta. Nämä kolmansien osapuolten kirjastoista löytyvät haavoittuvuudet synnyttävät tarpeen tunnistaa käännetyistä binääriohjelmistoista löytyvät avoimen lähdekoodin ohjelmistokirjastot. Binäärien samankaltaisuuden tunnistusmenetelmät usein pohjautuvat tekstin samankaltaisuuden tunnistusmenetelmiin ja hyödyntävät tämän tieteellisiä saavutuksia. Tässä tutkimuksessa keskitytään sumeisiin tunnistusmenetelmiin, joita on käytetty laajasti tekstin samankaltaisuuden tunnistamisessa. Tutkimuksessa hyödynnetään sijainnille sensitiivisiä tiivistemenetelmiä ja normalisoituja binäärien ominaisuuksia. Ominaisuuksien normalisoinnin avulla binäärien samankaltaisuutta voidaan vertailla ohjelmiston kääntämisessä käytetystä kääntäjästä, optimisaatiotasoista ja prosessoriarkkitehtuurista huolimatta. Menetelmän tarkkuutta parannetaan painotettujen binääriominaisuuksien avulla. Koneoppimista hyödyntämällä binääriomisaisuuksien painotus optimoidaan siten, että samankaltaisista binääreistä puretut ohjelmistoblokit luovat samankaltaisien ohjelmistojen joukkoja. Koneoppiminen suoritetaan erillisessä prosessissa, mikä parantaa järjestelmän suorituskykyä. Näiden menetelmien avulla luodaan tietokanta avoimen lähdekoodin kirjastojen tunnisteista. Tietokannan avulla minkä tahansa ohjelmiston samankaltaiset binääriblokit voidaan yhdistää tunnettuihin avoimen lähdekoodin kirjastoihin. Menetelmän tavoitteena on tarjota nopea ja skaalautuva samankaltaisuuden tunnistus. Näiden ominaisuuksien johdosta järjestelmä voidaan liittää osaksi ohjelmistokehitys-, integraatioprosesseja ja ohjelmistotestausta. Vertailu muihin kirjallisuudessa esiteltyihin menetelmiin osoittaa, että esitellyn menetlmän tulokset on vertailtavissa muihin kirjallisuudessa esiteltyihin menetelmiin tarkkuuden osalta. Menetelmä myös ylläpitää suorituskyvyn, jota vaaditaan jatkuvan integraation järjestelmissä

University of Oulu Repository - Jultika

Network Coding-Based Next-Generation IoT for Industry 4.0

Author: Bilbao Josu
Cid-Fuentes Raul G.
Crespo Pedro M.
Peralta Goiuri
Publication venue: 'IntechOpen'
Publication date: 05/11/2018
Field of study

Industry 4.0 has become the main source of applications of the Internet of Things (IoT), which is generating new business opportunities. The use of cloud computing and artificial intelligence is also showing remarkable improvements in industrial operation, saving millions of dollars to manufacturers. The need for time-critical decision-making is evidencing a trade-off between latency and computation, urging Industrial IoT (IIoT) deployments to integrate fog nodes to perform early analytics. In this chapter, we review next-generation IIoT architectures, which aim to meet the requirements of industrial applications, such as low-latency and highly reliable communications. These architectures can be divided into IoT node, fog, and multicloud layers. We describe these three layers and compare their characteristics, providing also different use-cases of IIoT architectures. We introduce network coding (NC) as a solution to meet some of the requirements of next-generation communications. We review a variety of its approaches as well as different scenarios that improve their performance and reliability thanks to this technique. Then, we describe the communication process across the different levels of the architecture based on NC-based state-of-the-art works. Finally, we summarize the benefits and open challenges of combining IIoT architectures together with NC techniques

IntechOpen

Crossref

Evaluation and performance of reading from big data formats

Author: Xavier Lucca Sergi Berquó
Publication venue
Publication date: 01/01/2021
Field of study

The emergence of new application profiles has caused a steep surge in the volume of data generated nowadays. Data heterogeneity is a modern trend, as unstructured types of data, such as videos and images, and semi-structured types, such as JSON and XML files, are becoming increasingly widespread. Consequently, new challenges related to analyzing and extracting important insights from huge bodies of information arise. The field of big data analytics has been developed to address these issues. Performance plays a key role in analytical scenarios, as it empowers applications to generate value in a more efficient and less time-consuming way. In this context, files are used to persist large quantities of information, which can be accessed later by analytic queries. Text files have the advantage of providing an easier interaction with the end user, whereas binary files propose structures that enhance data access. Among them, Apache ORC and Apache Parquet are formats that present characteristics such as column-oriented organization and data compression, which are used to achieve a better performance in queries. The objective of this project is to assess the usage of such files by SAP Vora, a distributed database management system, in order to draw out processing techniques used in big data analytics scenarios, and apply them to improve the performance of queries executed upon CSV files in Vora. Two techniques were employed to achieve such goal: file pruning, which allows Vora’s relational engine to ignore files possessing irrelevant information for the query, and block pruning, which disregards individual file blocks that do not possess data targeted by the query when processing files. Results demonstrate that these modifications enhance the efficiency of analytical workloads executed upon CSV files in Vora, thus narrowing the performance gap of queries executed upon this format and those targeting files tailored for big data scenarios, such as Apache Parquet and Apache ORC. The project was developed during an internship at SAP, in Walldorf, Germany.A emergência de novos perfis de aplicação ocasionou um aumento abrupto no volume de dados gerado na atualidade. A heterogeneidade de tipos de dados é uma nova tendência: encontram-se tipos não-estruturados, como vídeos e imagens, e semi-estruturados, tais quais arquivos JSON e XML. Consequentemente, novos desafios relacionados à extração de valores importantes de corpos de dados surgiram. Para este propósito, criou-se o ramo de big data analytics. Nele, a performance é um fator primordial pois garante análises rápidas e uma geração de valores eficiente. Neste contexto, arquivos são utilizados para persistir grandes quantidades de informações, que podem ser utilizadas posteriormente em consultas analíticas. Arquivos de texto têm a vantagem de proporcionar uma fácil interação com o usuário final, ao passo que arquivos binários propõem estruturas que melhoram o acesso aos dados. Dentre estes, o Apache ORC e o Apache Parquet são formatos que apresentam uma organização orientada a colunas e compressão de dados, o que permite aumentar o desempenho de acesso. O objetivo deste projeto é avaliar o uso desses arquivos na plataforma SAP Vora, um sistema de gestão de base de dados distribuído, com o intuito de otimizar a performance de consultas sobre arquivos CSV, de tipo texto, em cenários de big data analytics. Duas técnicas foram empregadas para este fim: file pruning, a qual permite que arquivos possuindo informações desnecessárias para consulta sejam ignorados, e block pruning, que permite eliminar blocos individuais do arquivo que não fornecerão dados relevantes para consultas. Os resultados indicam que essas modificações melhoram o desempenho de cargas de trabalho analíticas sobre o formato CSV na plataforma Vora, diminuindo a discrepância de performance entre consultas sobre esses arquivos e aquelas feitas sobre outros formatos especializados para cenários de big data, como o Apache Parquet e o Apache ORC. Este projeto foi desenvolvido durante um estágio realizado na SAP em Walldorf, na Alemanha

Lume 5.8

Effective Anomaly Detection Using Deep Learning in IoT Systems

Author: Aversano Lerina
Bernardi Mario Luca.
Cimitile Marta
Pecori Riccardo
Veltri Luca
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2021
Field of study

Anomaly detection in network traffic is a hot and ongoing research theme especially when concerning IoT devices, which are quickly spreading throughout various situations of people's life and, at the same time, prone to be attacked through different weak points. In this paper, we tackle the emerging anomaly detection problem in IoT, by integrating five different datasets of abnormal IoT traffic and evaluating them with a deep learning approach capable of identifying both normal and malicious IoT traffic as well as different types of anomalies. The large integrated dataset is aimed at providing a realistic and still missing benchmark for IoT normal and abnormal traffic, with data coming from different IoT scenarios. Moreover, the deep learning approach has been enriched through a proper hyperparameter optimization phase, a feature reduction phase by using an autoencoder neural network, and a study of the robustness of the best considered deep neural networks in situations affected by Gaussian noise over some of the considered features. The obtained results demonstrate the effectiveness of the created IoT dataset for anomaly detection using deep learning techniques, also in a noisy scenario

Archivio istituzionale della Ricerca - Università degli Studi di Parma

Open Access Repository

Archivio istituzionale della ricerca - eCampus Università Telematica

Symmetry-Adapted Machine Learning for Information Security

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

Symmetry-adapted machine learning has shown encouraging ability to mitigate the security risks in information and communication technology (ICT) systems. It is a subset of artificial intelligence (AI) that relies on the principles of processing future events by learning past events or historical data. The autonomous nature of symmetry-adapted machine learning supports effective data processing and analysis for security detection in ICT systems without the interference of human authorities. Many industries are developing machine-learning-adapted solutions to support security for smart hardware, distributed computing, and the cloud. In our Special Issue book, we focus on the deployment of symmetry-adapted machine learning for information security in various application areas. This security approach can support effective methods to handle the dynamic nature of security attacks by extraction and analysis of data to identify hidden patterns of data. The main topics of this Issue include malware classification, an intrusion detection system, image watermarking, color image watermarking, battlefield target aggregation behavior recognition model, IP camera, Internet of Things (IoT) security, service function chain, indoor positioning system, and crypto-analysis

Directory of Open Access Books (DOAB)

Recommended from our members

Fast embedding for image classification & retrieval and its application to the hostel industry

Author: Ammatmanee Chanattra
Publication venue: Brunel University London
Publication date: 01/01/2022
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonContent-based image classification and retrieval are the automatic processes of taking an unseen image input and extracting its features representing the input image. Then, for the classification task, this mathematically measured input is categorized according to established criteria in the server and consequently shows the output as a result. On the other hand, for the retrieval task, the extracted features of an unseen query image are sent to the server to search for the most visually similar images to a given image and retrieve these images as a result. Despite image features could be represented by classical features, artificial intelligence-based features, Convolutional Neural Networks (CNN) to be precise, have become powerful tools in the field. Nonetheless, the high dimensional CNN features have been a challenge in particular for applications on mobile or Internet of Things devices. Therefore, in this thesis, several fast embeddings are explored and proposed to overcome the constraints of low memory, bandwidth, and power. Furthermore, the first hostel image database is created with three datasets, hostel image dataset containing 13,908 interior and exterior images of hostels across the world, and Hostels-900 dataset and Hostels-2K dataset containing 972 images and 2,380 images, respectively, of 20 London hostel buildings. The results demonstrate that the proposed fast embeddings such as the application of GHM-Rand operator, GHM-Fix operator, and binary feature vectors are able to outperform or give competitive results to those state-of-the-art methods with a lot less computational resource. Additionally, the findings from a ten-year literature review of CBIR study in the tourism industry could picturize the relevant research activities in the past decade which are not only beneficial to the hostel industry or tourism sector but also to the computer science and engineering research communities for the potential real-life applications of the existing and developing technologies in the field

Brunel University Research Archive