Search CORE

121 research outputs found

Unsupervised anomaly detection for network traffic using artificial immune network

Author: Hong Shen (166278)
Y Shi (7709924)
Publication venue
Publication date: 01/08/2022
Field of study

In the existing approaches of multifarious knowledge based anomaly detection for network traffic, the priori knowledge labelled by human experts has to be consecutively updated for identification of new anomalies. Because anomalies usually show different patterns from the majority of network activities, it is hard to detect them based on the priori knowledge. Unsupervised anomaly detection using autonomous techniques without any priori knowledge is an effective strategy to overcome this drawback. In this paper, we propose a novel model of Unsupervised Anomaly Detection approach based on Artificial Immune Network (UADAIN) that consists of unsupervised clustering, cluster partition and anomaly detection. Our model uses the aiNet based unsupervised clustering approach to generate cluster centroids from network traffic, and the Cluster Centroids based Partition algorithm (CCP) then coarsely partition cluster centroids in the training phase as the self set (normal rules) and antibody set (anomalous rules). In test phase, to keep consecutive evolution of selves and antibodies, we introduce the Immune Network based Anomaly Detection model (INAD) to automatically learn and evolve the self set and antibody set. To evaluate the effectiveness of UADAIN, we conduct simulation experiments on ISCX 2012 IDS dataset and NSL-KDD dataset. In comparison with two popular anomaly detection approaches based on K-means clustering and aiNet-HC clustering, respectively, the experiment results demonstrate that UADAIN achieves better detection performance in detecting anomalies of network traffic

aCQUIRe

CSS: Handling imbalanced data by improved clustering with stratified sampling

Author: Hong Shen (166278)
L Cao (13294299)
Publication venue
Publication date: 25/01/2022
Field of study

The traditional support vector machine technique (SVM) has drawbacks in dealing with imbalanced data. To address this issue, in this paper we propose an algorithm of improved clustering with stratified sampling technique (CSS) to improve the classification performance of SVMs on imbalanced datasets. Instead of applying a single type of sampling method as used in the literature, our algorithm treats different type of classes with different sampling methods. For minority classes, the algorithm uses oversampling method by adding noise which obeys normal distribution around every support vector to generate new samples. For majority classes, samples are first divided into different clusters by applying first the improved clustering by fast search to find of density peaks (CFSFDP) to obtain latent structure information in each majority class and then stratified sampling method is applied to extract samples from each subcluster of the majority class. Moreover, we further extend this method into an ensemble classifiers that use multiple base SVM classifiers for prediction. The experimental results of classification on several imbalanced classification datasets show that our CSS is more effective than the state-of-the-art sampling methods

aCQUIRe

Adaptively periodic I/O scheduling for concurrent HPC applications

Author: B Zha (18391215)
Hong Shen (166278)
Publication venue
Publication date: 01/05/2022
Field of study

With the convergence of big data and HPC (high-performance computing), various machine learning applications and traditional large-scale simulations with a stochastically iterative I/O periodicity are running concurrently on HPC platforms, which poses more challenges on the scarcely shared I/O resources due to the ever-growing data transfer demand. Currently the existing heuristic online and periodic offline I/O scheduling methods for traditional HPC applications with a fixed I/O periodicity are not suitable for the applications with stochastically iterative I/O periodicities, which are required to schedule the concurrent I/Os from different applications under I/O congestion. In this work, we propose an adaptively periodic I/O scheduling (APIO) method that optimizes the system efficiency and application dilation by taking the stochastically iterative I/O periodicity of the applications into account. We first build a periodic offline scheduling method within a specified duration to capture the iterative nature. After that, APIO adjusts the bandwidth allocation to resist stochasticity based on the actual length of the computing phrase. In the case where the specified duration does not satisfy the actual running requirements, the period length will be extended to adapt to the actual duration. Theoretical analysis and extensive simulations demonstrate the efficiency of our proposed I/O scheduling method over the existing online approach

aCQUIRe

Deep reinforcement learning enhanced greedy optimization for online scheduling of batched tasks in cloud HPC systems

Author: Hong Shen (166278)
Y Yang (6099497)
Publication venue
Publication date: 01/11/2022
Field of study

In a large cloud data center HPC system, a critical problem is how to allocate the submitted tasks to heterogeneous servers that will achieve the goal of maximizing the system's gain defined as the value of completed tasks minus system operation costs. We consider this problem in the online setting that tasks arrive in batches and propose a novel deep reinforcement learning (DRL) enhanced greedy optimization algorithm of two-stage scheduling interacting task sequencing and task allocation. For task sequencing, we deploy a DRL module to predict the best allocation sequence for each arriving batch of tasks based on the knowledge (allocation strategies) learnt from previous batches. For task allocation, we propose a greedy strategy that allocates tasks to servers one by one online following the allocation sequence to maximize the total gain increase. We show that our greedy strategy has a performance guarantee of competitive ratio

\frac{1}{1+\kappa }

11+κ to the optimal offline solution, which improves the existing result for the same problem, where

\kappa

κ is upper bounded by the maximum cost-to-gain ratio of each task. While our DRL module enhances the greedy algorithm by providing the likely-optimal allocation sequence for each batch of arriving tasks, our greedy strategy bounds DRL's prediction error within a proven worst-case performance guarantee for any allocation sequence. It enables a better solution quality than that obtainable from both DRL and greedy optimization alone. Extensive experiment evaluation results in both simulation and real application environments demonstrate the effectiveness and efficiency of our proposed algorithm. Compared with the state-of-the-art baselines, our algorithm increases the system gain by about 10% to 30%. Our algorithm provides an interesting example of combining machine learning (ML) and greedy optimization techniques to improve ML-based solutions with a worst-case performance guarantee for solving hard optimization problems

aCQUIRe

Fairness-aware scheduling of dynamic cross-job coflows in shared datacenters based on meta learning

Author: H Huang (7723895)
Hong Shen (166278)
Publication venue
Publication date: 01/05/2022
Field of study

In today's shared datacenters, communications among tasks across jobs (applications) typically generate large amounts of coflows. System efficiency and job fairness oriented coflow scheduling is critical for improving both system performance and user satisfaction in the application level. Since the metrics of fairness and efficiency usually conflict to each other, how to schedule coflows at the desired tradeoff between the metrics is a challenging problem. Due to the great variety of jobs and tasks within each job, communications among them have different characteristics, resulting their coflows presented in different patterns. Existing coflow schedulers attempt to optimize either only one metric or a tradeoff of both for static patterns of coflows (within the same job) because of their incapability of capturing the inherent patterns of coflows that are dynamically changing specially for cross-job coflows. This paper proposes a novel fairness-aware coflow scheduling algorithm that combines a link-embedded multi-layer neural network with a meta-learning framework for unsupervised learning of dynamic coflows across different jobs and hence adaptive scheduling of the coflows to achieve the desired fairness–efficiency tradeoff. In our algorithm, while the neural network takes care of mining the relationships among coflow allocations, the meta-learning framework effectively captures the dynamic patterns of a large number of sample coflows and trains the neural network to achieve the desired scheduling performance. Extensive experimental results demonstrate that our algorithm outperforms the state-of-the-art coflow schedulers on scheduling performance for achieving the desired fairness–efficiency tradeoff

aCQUIRe

Electrochemical Arsine Generators for Arsenic Determination

Author: Hong Shen (166278)
Purnendu K. Dasgupta (1405219)
Publication venue
Publication date: 05/08/2014
Field of study

Arsine generation is the gateway for several sensitive and selective methods of As determination. An electrochemical arsine generator (EAG) is especially green: we report here the use of two electrode materials, aluminum and highly oriented (ordered) pyrolytic graphite (HOPG) never before used for this purpose. The first is operated on a novel constant voltage mode: current flows only when the sample, deliberately made highly conductive with acid, is injected. As a result, the cathode, despite being a highly active metal that will self-corrode in acid, lasts a long time. This EAG can be made to respond to As(III) and As(V) in an equivalent fashion and is fabricated with two readily available chromatographic T-fittings. It permits the use of a wire roll as the cathode, permitting rapid renewal of the electrode. The HOPG-based EAG is easily constructed from ion chromatography suppressor shells and can convert As(III) to AsH3 quantitatively but has significantly lower response to As(V); this difference can be exploited for speciation. The success of Al, an active metal, also dispels the maxim that metals with high hydrogen overpotential are best for electrochemical hydride generation. We report construction, operation, and performance details of these EAGs. Using gas phase chemiluminescence (GPCL) with ozone as a complementary green analytical technique, we demonstrate attractive limits of detection (LODs) (S/N = 3) of 1.9 and 1.0 μg/L As(V) and As(III) for the HOPG-based EAG and 1.4 μg/L As(V) or As(III) for the Al-based EAG, respectively. Precision at the ∼20 μg/L As(V) level was 2.4% and 2.1% relative standard deviation (RSD) for HOPG- and Al-based EAGs, respectively. Both HOPG- and Al-based EAGs permitted a sample throughput of 12/h. For groundwater samples from West Texas and West Bengal, India, very comparable results were obtained with parallel measurements by induction coupled plasma-mass spectrometry

FigShare

Structural–temporal embedding of large-scale dynamic networks with parallel implementation

Author: D Feng (9882923)
Hong Shen (166278)
L Xie (11202537)
Publication venue
Publication date: 01/05/2022
Field of study

Due to the widespread network data in the real world, network analysis has attracted increasing attention in recent years. In complex systems such as social networks, entities and their mutual relations can be respectively represented by nodes and edges composing a network. Because occurrences of entities and relations in these systems are often dynamic over time, their networks are called temporal networks describing the process of dynamic connection of nodes in the networks. Dynamic network embedding aims to embed nodes in a temporal network into a low-dimensional semantic space, such that the network structures and evolution patterns can be preserved as much as possible in the latent space. Most existing methods capture structural similarities (relations) of strongly-connected nodes based on their historical neighborhood information, they ignore the structural similarities of weakly-connected nodes that may also represent relations and include no explicit temporal information in node embeddings for capturing periodic dependency of events. To address these issues, we propose a novel temporal network embedding model by extending the structure similarity to cover both strong connections and weak connections among nodes, and including the temporal information in node embeddings. To improve the training efficiency of our model, we present a parallel training strategy to quickly acquire node embeddings. Extensive experiments on several real-world temporal networks demonstrate that our model significantly outperforms the state-of-the-arts in traditional tasks, including link prediction and node classification

aCQUIRe

Efficient and fair: Information-agnostic online coflow scheduling by combining limited multiplexing with DRL

Author: H Tian (9300791)
Hong Shen (166278)
X Wang (1378959)
Publication venue
Publication date: 01/12/2023
Field of study

In shared data center networks, communications among users can be modeled as coflows, each comprising of a group of parallel data transmission flows. Efficient and fair scheduling of coflows is critical for improving both system performance and user satisfaction at the application level. Existing coflow scheduling methods maximizing efficiency (coflow completion time, CCT) and fairness (service isolation) simultaneously require prior knowledge of coflow (flow) size that is however not known before completion of coflow execution in reality, which limits their applicability. For information-agnostic scheduling, known results focus either solely on efficiency or fairness, but not both due to the hardness of achieving the desired compromise between them. In this paper, we first present an information-aware non-preemptive coflow scheduling algorithm, and show its provable long-term isolation guarantee under reasonable assumptions. We then adapt this algorithm to information-agnostic online coflow scheduling by combining limited multiplexing with Deep Reinforcement Learning (DRL) framework to achieve long-term isolation guarantee toward fair network sharing and lower average weighted CCT simultaneously. The simulation results show that our algorithm outperforms the state-of-the-art results of both fairness-optimal scheduling (NC-DRF) by 4.92in terms of average weighted CCT and performance-optimal scheduling (Aalo) in the metric of maximum normalized CCT. This fully demonstrates the superiority of our method in simultaneous optimization of efficiency and fairness for information-agnostic coflow scheduling

aCQUIRe

Improve the quality of charging services for rechargeable wireless sensor networks by deploying a mobile vehicle with multiple removable chargers

Author: H Tian (9300791)
Hong Shen (166278)
ZS Chen (17638512)
Publication venue
Publication date: 01/10/2022
Field of study

The increasing demand for real-time applications of Wireless Sensor Networks (WSNs) makes Quality of Service (QoS)-based charging scheduling models an interesting and hot research topic. Satisfying QoS requirements (e.g. data collection integrity, charging respond delay, etc.) for the different applications of WSNs raises significant challenges. More precisely, an effective scheduling strategy not only needs to improve the charging efficiency of charging vehicles but also needs to reduce the charging respond delay of the requests to be charged, all of which must be based on the integrity of data collection. For such applications, existing studies on charging issue often deployed one or more mobile vehicles, which have deficiencies in practical applications. On one hand, it usually is insufficient to employ just one vehicle to charge many sensors in a large-scale application scenario due to the limited battery capacity of the charging vehicle or energy depletion of some sensors before the arrival of the charging vehicle. On the other hand, while the collaboration between multiple vehicles for large-scale WSNs can significantly increase charging capacity, the cost is too high in terms of the initial investment and maintenance costs of these vehicles. To overcome these deficits, in this work, we propose a novel QoS-based on-demand charge scheduling (abbreviated shortly as QOCS) model that one charging vehicle carries multiple removable battery powered chargers. In the novel QoS-based charging model, we study the charging scheduling problem of requesting nodes to guarantee the integrity of network data collection and maximize the satisfaction of charging services. In the QOCS model, We jointly consider the coverage contribution and energy urgency to sort the charging requests of sensors, and introduce a hybrid power supply mechanism based on supply and demand to improve energy utilization. We evaluate the performance of the proposed model through extensive simulation and experimental results show that our model achieves better performance than existing methods

aCQUIRe

Transient Nutlin-3 treatment of diploid HCT116 clones induces the appearance of cells with>4N DNA content.

Author: Batzaya Davaadelger (396835)
Carl G. Maki (396836)
Hong Shen (166278)
Publication venue
Publication date
Field of study

A) Diploid HCT116 clones D3 and D8 were untreated (NT) or exposed to Nutlin (NUT 10 µM) for 24 hrs, followed by Nutlin removal. The cells were harvested at the indicated times after Nutlin removal. Fixed cells were stained with propidium iodide (25 µg/ml) and subjected to flow cytometry analysis. B) Cells were untreated (NT) or exposed to Nutlin (NUT 10 µM) for 24 hrs, followed by Nutlin removal. Cell lysates were collected at the indicated time points and analyzed by immunoblotting with the indicated antibodies. Actin was as a loading control. P-Cdc2, phosphor-Cdc2 (Tyr-15).</p

FigShare