2,617 research outputs found

    Detecting Poisoning Attacks on Hierarchical Malware Classification Systems

    Get PDF
    Anti-virus software based on unsupervised hierarchical clustering (HC) of malware samples has been shown to be vulnerable to poisoning attacks. In this kind of attack, a malicious player degrades anti-virus performance by submitting to the database samples specifically designed to collapse the classification hierarchy utilized by the anti-virus (and constructed through HC) or otherwise deform it in a way that would render it useless. Though each poisoning attack needs to be tailored to the particular HC scheme deployed, existing research seems to indicate that no particular HC method by itself is immune. We present results on applying a new notion of entropy for combinatorial dendrograms to the problem of controlling the influx of samples into the data base and deflecting poisoning attacks. In a nutshell, effective and tractable measures of change in hierarchy complexity are derived from the above, enabling on-the-fly flagging and rejection of potentially damaging samples. The information-theoretic underpinnings of these measures ensure their indifference to which particular poisoning algorithm is being used by the attacker, rendering them particularly attractive in this setting

    Hierarchical cluster analysis in clinical research with heterogeneous study population: highlighting its visualization with R

    Get PDF
    Big data clinical research typically involves thousands of patients and there are numerous variables available. Conventionally, these variables can be handled by multivariable regression modeling. In this article, the hierarchical cluster analysis (HCA) is introduced. This method is used to explore similarity between observations and/or clusters. The result can be visualized using heat maps and dendrograms. Sometimes, it would be interesting to add scatter plot and smooth lines into the panels of the heat map. The inherent R heatmap package does not provide this function. A series of scatter plots can be created using lattice package, and then background color of each panel is mapped to the regression coefficient by using custom-made panel functions. This is the unique feature of the lattice package. Dendrograms and color keys can be added as the legend elements of the lattice system. The latticeExtra package provides some useful functions for the work.N/

    Analysis of acid-stressed Bacillus cereus reveals a major oxidative response and inactivation-associated radical formation

    Get PDF
    Acid stress resistance of the food-borne human pathogen Bacillus cereus may contribute to its survival in acidic environments, such as encountered in soil, food and the human gastrointestinal tract. The acid stress responses of B. cereus strains ATCC 14579 and ATCC 10987 were analysed in aerobically grown cultures acidified to pH values ranging from pH 5.4 to pH 4.4 with HCI. Comparative phenotype and transcriptome analyses revealed three acid stressinduced responses in this pH range: growth rate reduction, growth arrest and loss of viability. These physiological responses showed to be associated with metabolic shifts and the induction of general stress response mechanisms with a major oxidative component, including upregulation of catalases and superoxide dismutases. Flow cytometry analysis in combination with the hydroxyl (OH center dot) and peroxynitrite (ONOO-)-specific fluorescent probe 3'-(phydroxyphenyl) fluorescein (HPF) showed excessive radicals to be formed in both B. cereus strains in bactericidal conditions only. Our study shows that radicals can indicate acid-induced malfunctioning of cellular processes that lead to cell death

    SUBDISTRICT CLUSTERING IN WEST JAVA PROVINCE BASED ON DISEASE INCIDENCE OF JKN PARTICIPANTS PRIMARY SERVICES

    Get PDF
    One of the efforts that can be done to optimize health services and the distribution of facilities and infrastructure efficiently in a wide scope is by profiling and clustering areas in the province of West Java to the scope of sub-districts that have similar characteristics of disease category. The methods that will be compared to get the best clustering are hierarchical clustering and ensemble clustering. The data used as the object of research is the BPJS Kesehatan capitation primary service sample data for the 2017-2018 period. Some of the important variables used include: primary disease diagnosis data (ICD-10) of patients at the puskesmas, service time, type of visit, and location of service sub-district. This study uses several evaluation metrics Silhouette coefficient, Dunn index, Davies-Bouldin index, and C-index to determine the optimal number of clusters formed. In addition, descriptive analysis and visualization of the clustering results are also used as considerations in selecting the optimal cluster. Based on the evaluation results, the optimal method is hierarchical clustering with complete linkage. This method produces three clusters: cluster 1 consists of 5 sub-districts that have a high/dominant mean value in almost all disease categories, cluster 2 consists of 26 sub-districts that have a medium mean value, and cluster 3 consists of 589 sub-districts that have a low mean value. Most of the members of clusters 1 and 2 are sub-districts located in the districts/cities around the national capital (DKI Jakarta) and the provincial capital (Bandung) while the members of cluster 3 are mostly sub-districts located in suburban districts/cities or far from the central government

    A black-Box adversarial attack for poisoning clustering

    Get PDF
    Clustering algorithms play a fundamental role as tools in decision-making and sensible automation pro-cesses. Due to the widespread use of these applications, a robustness analysis of this family of algorithms against adversarial noise has become imperative. To the best of our knowledge, however, only a few works have currently addressed this problem. In an attempt to fill this gap, in this work, we propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algo-rithms. We formulate the problem as a constrained minimization program, general in its structure and customizable by the attacker according to her capability constraints. We do not assume any information about the internal structure of the victim clustering algorithm, and we allow the attacker to query it as a service only. In the absence of any derivative information, we perform the optimization with a custom approach inspired by the Abstract Genetic Algorithm (AGA). In the experimental part, we demonstrate the sensibility of different single and ensemble clustering algorithms against our crafted adversarial samples on different scenarios. Furthermore, we perform a comparison of our algorithm with a state-of-the-art approach showing that we are able to reach or even outperform its performance. Finally, to highlight the general nature of the generated noise, we show that our attacks are transferable even against supervised algorithms such as SVMs, random forests and neural networks. (c) 2021 Elsevier Ltd. All rights reserved

    AGGLOMERATIVE HIERARCHICAL CLUSTERING ANALYSIS IN PREDICTING ANTIBACTERIAL ACTIVITY OF COMPOUND BASED ON CHEMICAL STRUCTURE SIMILARITY

    Get PDF
    Resistance to antibiotics is increasing to alarmingly high levels. As antibiotics are less effective, more infections are becoming more complex and often impossible to treat. Numerous antibiotics discovered in marine organisms show that the marine environment, which accounts for over half of the world's biodiversity, is a massive source for novel antibiotics and that this resource must be explored to identify next-generation antibiotics. This research aimed to predict antibacterial activity in marine compounds using a computational approach to reduce the cost and time of finding marine organisms, extracting, and testing numerous unknown marine compounds' bioactivities. We used a simple unsupervised learning approach to predict the biological activity of marine compounds using agglomerative hierarchical clustering. We mixed antibiotic drug data in DrugBank Database and chemical compound data from marine organisms in literature to compile our dataset. We applied five linkage methods in our dataset and compared the best method by assessing internal validation measurement. We found that the Ward with squared dissimilarity matrix is the best method in the dataset, and ten compounds from 73 compounds of the marine compound are determined as potential marine compounds which have antibacterial activity
    • …
    corecore