194 research outputs found

    CNS-Net: Conservative Novelty Synthesizing Network for Malware Recognition in an Open-set Scenario

    Full text link
    We study the challenging task of malware recognition on both known and novel unknown malware families, called malware open-set recognition (MOSR). Previous works usually assume the malware families are known to the classifier in a close-set scenario, i.e., testing families are the subset or at most identical to training families. However, novel unknown malware families frequently emerge in real-world applications, and as such, require to recognize malware instances in an open-set scenario, i.e., some unknown families are also included in the test-set, which has been rarely and non-thoroughly investigated in the cyber-security domain. One practical solution for MOSR may consider jointly classifying known and detecting unknown malware families by a single classifier (e.g., neural network) from the variance of the predicted probability distribution on known families. However, conventional well-trained classifiers usually tend to obtain overly high recognition probabilities in the outputs, especially when the instance feature distributions are similar to each other, e.g., unknown v.s. known malware families, and thus dramatically degrades the recognition on novel unknown malware families. In this paper, we propose a novel model that can conservatively synthesize malware instances to mimic unknown malware families and support a more robust training of the classifier. Moreover, we also build a new large-scale malware dataset, named MAL-100, to fill the gap of lacking large open-set malware benchmark dataset. Experimental results on two widely used malware datasets and our MAL-100 demonstrate the effectiveness of our model compared with other representative methods.Comment: 16 pages, 8 figure

    MDENet: Multi-modal Dual-embedding Networks for Malware Open-set Recognition

    Full text link
    Malware open-set recognition (MOSR) aims at jointly classifying malware samples from known families and detect the ones from novel unknown families, respectively. Existing works mostly rely on a well-trained classifier considering the predicted probabilities of each known family with a threshold-based detection to achieve the MOSR. However, our observation reveals that the feature distributions of malware samples are extremely similar to each other even between known and unknown families. Thus the obtained classifier may produce overly high probabilities of testing unknown samples toward known families and degrade the model performance. In this paper, we propose the Multi-modal Dual-Embedding Networks, dubbed MDENet, to take advantage of comprehensive malware features (i.e., malware images and malware sentences) from different modalities to enhance the diversity of malware feature space, which is more representative and discriminative for down-stream recognition. Last, to further guarantee the open-set recognition, we dually embed the fused multi-modal representation into one primary space and an associated sub-space, i.e., discriminative and exclusive spaces, with contrastive sampling and rho-bounded enclosing sphere regularizations, which resort to classification and detection, respectively. Moreover, we also enrich our previously proposed large-scaled malware dataset MAL-100 with multi-modal characteristics and contribute an improved version dubbed MAL-100+. Experimental results on the widely used malware dataset Mailing and the proposed MAL-100+ demonstrate the effectiveness of our method.Comment: 14 pages, 7 figure

    Relational Message Passing for Fully Inductive Knowledge Graph Completion

    Full text link
    In knowledge graph completion (KGC), predicting triples involving emerging entities and/or relations, which are unseen when the KG embeddings are learned, has become a critical challenge. Subgraph reasoning with message passing is a promising and popular solution. Some recent methods have achieved good performance, but they (i) usually can only predict triples involving unseen entities alone, failing to address more realistic fully inductive situations with both unseen entities and unseen relations, and (ii) often conduct message passing over the entities with the relation patterns not fully utilized. In this study, we propose a new method named RMPI which uses a novel Relational Message Passing network for fully Inductive KGC. It passes messages directly between relations to make full use of the relation patterns for subgraph reasoning with new techniques on graph transformation, graph pruning, relation-aware neighborhood attention, addressing empty subgraphs, etc., and can utilize the relation semantics defined in the ontological schema of KG. Extensive evaluation on multiple benchmarks has shown the effectiveness of techniques involved in RMPI and its better performance compared with the existing methods that support fully inductive KGC. RMPI is also comparable to the state-of-the-art partially inductive KGC methods with very promising results achieved. Our codes and data are available at https://github.com/zjukg/RMPI.Comment: under revie

    Predict Market Share with Users’ Online Activities Data: An Initial Study on Market Share and Search Index of Mobile Phone

    Get PDF
    Acquiring accurate and timely market share information is very important for producers to arrange producing plan and design marketing strategy. However the high cost and long period of collecting survey data in survey-based method make it much difficult to easily get latest market shares data. Recently, the emerging online web systems provide users with new and convenient ways of searching, learning, experiencing and buying products. The users’ activities data captured by these web systems can reflect users’ buying intentions and behaviours very well, and contain very valuable information for predicting market shares. In this study, the correlation between Google search index and market shares of mobile phones is analyzed with time series analysis technology. The experiment result shows the statistically significant relationships exist between search index and market shares. This indicates the easily got search index data with low cost has the power of timely forecasting market shares. This study opens a door to apply users’ online activities data to accurately and timely predict market shares, which will bring many benefits to producers and customers

    Roles of circRNA dysregulation in esophageal squamous cell carcinoma tumor microenvironment

    Get PDF
    Esophageal squamous cell carcinoma (ESCC) is the most prevalent histological esophageal cancer characterized by advanced diagnosis, metastasis, resistance to treatment, and frequent recurrence. In recent years, numerous human disorders such as ESCC, have been linked to abnormal expression of circular RNAs (circRNAs), suggesting that they are fundamental to the intricate system of gene regulation that governs ESCC formation. The tumor microenvironment (TME), referring to the area surrounding the tumor cells, is composed of multiple components, including stromal cells, immune cells, the vascular system, extracellular matrix (ECM), and numerous signaling molecules. In this review, we briefly described the biological purposes and mechanisms of aberrant circRNA expression in the TME of ESCC, including the immune microenvironment, angiogenesis, epithelial-to-mesenchymal transition, hypoxia, metabolism, and radiotherapy resistance. As in-depth research into the processes of circRNAs in the TME of ESCC continues, circRNAs are promising therapeutic targets or delivery systems for cancer therapy and diagnostic and prognostic indicators for ESCC

    Genomic signatures and prognosis of advanced stage Chinese pediatric T cell lymphoblastic lymphoma by whole exome sequencing

    Get PDF
    ObjectiveTo investigate the genomic signatures and prognosis of advanced-stage T cell lymphoblastic lymphoma (T-LBL) and to examine the relationship between T-LBL and T cell acute lymphoblastic leukemia (T-ALL).Methods35 Chinese T-LBL children with stage III or IV disease were recruited for this study. They were treated with combination chemotherapy and whole exome sequencing. The relationship of the clinical features, prognosis and specific gene mutations was researched. Gene chips of T-LBL and T-ALL were downloaded from a database, and differential gene expression was analyzed.ResultsGermline causal gene mutations (CARS or MAP2K2) were detected in 2 patients; 3.06 ± 2.21 somatic causal gene mutations were identified in the 35 patients, and somatic mutations were observed in the NOTCH1, FBXW7, PHF6 and JAK3 genes. NOTCH1 mutations were significantly associated with FBXW7 mutations, and the age at diagnosis of patients with NOTCH1-FBXW7 mutations was less than that of patients without such mutations (P < 0.05). 32 patients achieved complete remission (CR), and 14 and 18 patients were classified into the intermediate risk (IR) group and high risk (HR) group. During a median follow-up of 44 months, 3 patients relapsed. Three-year prospective event free survival (pEFS) was 82.286%, and no significant differences of pEFS were found for different sexes, ages, or statuses of NOTCH1-FBXW7 mutations, (P > 0.05); however, the mean survival time of the IR group was longer than that of the HR group (P < 0.05). Differential expression of genes in the T-LBL and/or T-ALL datasets was analyzed using the R package limma, and 1/3 of the differentially expressed genes were found in both the T-ALL and T-LBL datasets. High expression of PI3K-Akt signal pathway genes and the USP34 gene was found in the T-LBL dataset.ConclusionAlthough T-ALL and T-LBL both originate from precursor T-cells and are considered different manifestations of the same disease and the outcome of T-LBL is favorable when using T-ALL-based chemotherapy, there are differences in the gene distribution between T-LBL and T-ALL. It seems that the PI3K-Akt signaling pathway and the USP34 gene play important roles in T-LBL, but medicines targeting the USP34 gene or the PI3K-Akt pathway may be invalid

    MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

    Full text link
    As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with relevant images attached. We noticed that current MMEA algorithms all globally adopt the KG-level modality fusion strategies for multi-modal entity representation but ignore the variation in modality preferences for individual entities, hurting the robustness to potential noise involved in modalities (e.g., blurry images and relations). In this paper, we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, which dynamically predicts the mutual correlation coefficients among modalities for entity-level feature aggregation. A modal-aware hard entity replay strategy is further proposed for addressing vague entity details. Experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has a comparable number of parameters, optimistic speed, and good interpretability. Our code and data are available at https://github.com/zjukg/MEAformer.Comment: Repository: https://github.com/zjukg/MEAforme
    corecore