402 research outputs found

    The Nature of Ephemeral Secrets in Reverse Engineering Tasks

    Get PDF
    Reverse engineering is typically carried out on static binary objects, such as files or compiled programs. Often the goal of reverse engineering is to extract a secret that is ephemeral and only exists while the system is running. Automation and dynamic analysis enable reverse engineers to extract ephemeral secrets from dynamic systems, obviating the need for analyzing static artifacts such as executable binaries. I support this thesis through four automated reverse engineering efforts: (1) named entity extraction to track Chinese Internet censorship based on keywords; (2) dynamic information flow tracking to locate secret keys in memory for a live program; (3) man-in-the-middle to emulate server behavior for extracting cryptographic secrets; and, (4) large-scale measurement and data mining of TCP/IP handshake behaviors to reveal machines on the Internet vulnerable to TCP/IP hijacking and other attacks. In each of these cases, automation enables the extraction of ephemeral secrets, often in situations where there is no accessible static binary object containing the secret. Furthermore, each project was contingent on building an automated system that interacted with the dynamic system in order to extract the secret(s). This general approach provides a new perspective, increasing the types of systems that can be reverse engineered and provides a promising direction for the future of reverse engineering

    Pertanika Journal of Science & Technology

    Get PDF

    Pertanika Journal of Science & Technology

    Get PDF

    Mapping (Dis-)Information Flow about the MH17 Plane Crash

    Get PDF
    Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

    Delegated Dictatorship: Examining the State and Market Forces behind Information Control in China

    Full text link
    A large body of literature devoted to analyzing information control in China concludes that we find imperfect censorship because the state has adopted a minimalist strategy for information control. In other words, the state is deliberately selective about the content that it censors. While some claim that the government limits its attention to the most categorically harmful content—content that may lead to mobilization—others suggest that the state limits the scope of censorship to allow space for criticism which enables the state to gather information about popular grievances or badly performing local cadres. In contrast, I argue that imperfect censorship in China results from a precise and covert implementation of the government's maximalist strategy for information control. The state is intolerant of government criticisms, discussions of collective action, non-official coverage of crime, and a host of other types of information that may challenge state authority and legitimacy. This strategy produces imperfect censorship because the state prefers to implement it covertly, and thus, delegates to private companies, targets repression, and engages in astroturfing to reduce the visibility and disruptiveness of information control tactics. This both insulates the state from popular backlash and increases the effectiveness of its informational interventions. I test the hypotheses generated from this theory by analyzing a custom dataset of censorship logs from a popular social media company, Sina Weibo. These logs measure the government's intent about what content should and should not be censored. A systematic analysis of content targeted for censorship demonstrates the broadness of the government's censorship agenda. These data also show that delegation to private companies softens and refines the state's informational interventions so that the government's broad agenda is maximally implemented while minimizing popular backlash that would otherwise threaten the effectiveness of its informational interventions.PHDPolitical ScienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147514/1/blakeapm_1.pd

    Developing of Ultrasound Experimental Methods using Machine Learning Algorithms for Application of Temperature Monitoring of Nano-Bio-Composites Extrusion

    Get PDF
    In industry fiber degradation during processing of biocomposite in the extruder is a problem that requires a reliable solution to save time and money wasted on producing damaged material. In this thesis, We try to focus on a practical solution that can monitor the change in temperature that causes fiber degradation and material damage to stop it when it occurs. Ultrasound can be used to detect the temperature change inside the material during the process of material extrusion. A monitoring approach for the extruder process has been developed using ultrasound system and the techniques of machine learning algorithms. A measurement cell was built to form a dataset of ultrasound signals at different temperatures for analysis. Machine learning algorithms were applied through machine-learning algorithm’s platform to classify the dataset based on the temperature. The dataset was classified with accuracy 97% into two categories representing over and below damage temperature (190oc) ultrasound signal. This approach could be used in industry to send an alarm or a temperature control signal when material damage is detected. Biocomposite is at the core of automotive industry material research and development concentration. Melt mixing process was used to mix biocomposite material with multi-walled carbon nanotubes (MWCNTs) for the purpose of enhancing mechanical and thermal properties of biocomposite. The resulting composite nano-bio- composite was tested via different types of thermal and mechanical tests to evaluate its performance relative to biocomposite. The developed material showed enhancement in mechanical and thermal properties that considered a high potential for applications in the future

    Fraud detection for online banking for scalable and distributed data

    Get PDF
    Online fraud causes billions of dollars in losses for banks. Therefore, online banking fraud detection is an important field of study. However, there are many challenges in conducting research in fraud detection. One of the constraints is due to unavailability of bank datasets for research or the required characteristics of the attributes of the data are not available. Numeric data usually provides better performance for machine learning algorithms. Most transaction data however have categorical, or nominal features as well. Moreover, some platforms such as Apache Spark only recognizes numeric data. So, there is a need to use techniques e.g. One-hot encoding (OHE) to transform categorical features to numerical features, however OHE has challenges including the sparseness of transformed data and that the distinct values of an attribute are not always known in advance. Efficient feature engineering can improve the algorithm’s performance but usually requires detailed domain knowledge to identify correct features. Techniques like Ripple Down Rules (RDR) are suitable for fraud detection because of their low maintenance and incremental learning features. However, high classification accuracy on mixed datasets, especially for scalable data is challenging. Evaluation of RDR on distributed platforms is also challenging as it is not available on these platforms. The thesis proposes the following solutions to these challenges: • We developed a technique Highly Correlated Rule Based Uniformly Distribution (HCRUD) to generate highly correlated rule-based uniformly-distributed synthetic data. • We developed a technique One-hot Encoded Extended Compact (OHE-EC) to transform categorical features to numeric features by compacting sparse-data even if all distinct values are unknown. • We developed a technique Feature Engineering and Compact Unified Expressions (FECUE) to improve model efficiency through feature engineering where the domain of the data is not known in advance. • A Unified Expression RDR fraud deduction technique (UE-RDR) for Big data has been proposed and evaluated on the Spark platform. Empirical tests were executed on multi-node Hadoop cluster using well-known classifiers on bank data, synthetic bank datasets and publicly available datasets from UCI repository. These evaluations demonstrated substantial improvements in terms of classification accuracy, ruleset compactness and execution speed.Doctor of Philosoph

    A Comprehensive Survey on Rare Event Prediction

    Full text link
    Rare event prediction involves identifying and forecasting events with a low probability using machine learning and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the machine learning pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and machine learning. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.Comment: 44 page

    Explainable Predictive Maintenance

    Full text link
    Explainable Artificial Intelligence (XAI) fills the role of a critical interface fostering interactions between sophisticated intelligent systems and diverse individuals, including data scientists, domain experts, end-users, and more. It aids in deciphering the intricate internal mechanisms of ``black box'' Machine Learning (ML), rendering the reasons behind their decisions more understandable. However, current research in XAI primarily focuses on two aspects; ways to facilitate user trust, or to debug and refine the ML model. The majority of it falls short of recognising the diverse types of explanations needed in broader contexts, as different users and varied application areas necessitate solutions tailored to their specific needs. One such domain is Predictive Maintenance (PdM), an exploding area of research under the Industry 4.0 \& 5.0 umbrella. This position paper highlights the gap between existing XAI methodologies and the specific requirements for explanations within industrial applications, particularly the Predictive Maintenance field. Despite explainability's crucial role, this subject remains a relatively under-explored area, making this paper a pioneering attempt to bring relevant challenges to the research community's attention. We provide an overview of predictive maintenance tasks and accentuate the need and varying purposes for corresponding explanations. We then list and describe XAI techniques commonly employed in the literature, discussing their suitability for PdM tasks. Finally, to make the ideas and claims more concrete, we demonstrate XAI applied in four specific industrial use cases: commercial vehicles, metro trains, steel plants, and wind farms, spotlighting areas requiring further research.Comment: 51 pages, 9 figure

    Automatic Essay Scoring Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

    Get PDF
    Deep-learning based Automatic Essay Scoring (AES) systems are being actively used in various high-stake applications in education and testing. However, little research has been put to understand and interpret the black-box nature of deep-learning-based scoring algorithms. While previous studies indicate that scoring models can be easily fooled, in this paper, we explore the reason behind their surprising adversarial brittleness. We utilize recent advances in interpretability to find the extent to which features such as coherence, content, vocabulary, and relevance are important for automated scoring mechanisms. We use this to investigate the oversensitivity (i.e., large change in output score with a little change in input essay content) and overstability (i.e., little change in output scores with large changes in input essay content) of AES. Our results indicate that autoscoring models, despite getting trained as “end-to-end” models with rich contextual embeddings such as BERT, behave like bag-of-words models. A few words determine the essay score without the requirement of any context making the model largely overstable. This is in stark contrast to recent probing studies on pre-trained representation learning models, which show that rich linguistic features such as parts-of-speech and morphology are encoded by them. Further, we also find that the models have learnt dataset biases, making them oversensitive. The presence of a few words with high co-occurrence with a certain score class makes the model associate the essay sample with that score. This causes score changes in ∼95% of samples with an addition of only a few words. To deal with these issues, we propose detection-based protection models that can detect oversensitivity and samples causing overstability with high accuracies. We find that our proposed models are able to detect unusual attribution patterns and flag adversarial samples successfully
    corecore