12,550 research outputs found
A scalable application server on Beowulf clusters : a thesis presented in partial fulfilment of the requirement for the degree of Master of Information Science at Albany, Auckland, Massey University, New Zealand
Application performance and scalability of a large distributed multi-tiered application is a core requirement for most of today's critical business applications. I have investigated the scalability of a J2EE application server using the standard ECperf benchmark application in the Massey Beowulf Clusters namely the Sisters and the Helix. My testing environment consists of Open Source software: The integrated JBoss-Tomcat as the application server and the web server, along with PostgreSQL as the database. My testing programs were run on the clustered application server, which provide replication of the Enterprise Java Bean (EJB) objects. I have completed various centralized and distributed tests using the JBoss Cluster. I concluded that clustering of the application server and web server will effectively increase the performance of the application running on them given sufficient system resources. The application performance will scale to a point where a bottleneck has occurred in the testing system, the bottleneck could be any resources included in the testing environment: the hardware, software, network and the application that is running. Performance tuning for a large-scale J2EE application is a complicated issue, which is related to the resources available. However, by carefully identifying the performance bottleneck in the system with hardware, software, network, operating system and application configuration. I can improve the performance of the J2EE applications running in a Beowulf Cluster. The software bottleneck can be solved by changing the default settings, on the other hand, hardware bottlenecks are harder unless more investment are made to purchase higher speed and capacity hardware
Discriminative models for multi-instance problems with tree-structure
Modeling network traffic is gaining importance in order to counter modern
threats of ever increasing sophistication. It is though surprisingly difficult
and costly to construct reliable classifiers on top of telemetry data due to
the variety and complexity of signals that no human can manage to interpret in
full. Obtaining training data with sufficiently large and variable body of
labels can thus be seen as prohibitive problem. The goal of this work is to
detect infected computers by observing their HTTP(S) traffic collected from
network sensors, which are typically proxy servers or network firewalls, while
relying on only minimal human input in model training phase. We propose a
discriminative model that makes decisions based on all computer's traffic
observed during predefined time window (5 minutes in our case). The model is
trained on collected traffic samples over equally sized time window per large
number of computers, where the only labels needed are human verdicts about the
computer as a whole (presumed infected vs. presumed clean). As part of training
the model itself recognizes discriminative patterns in traffic targeted to
individual servers and constructs the final high-level classifier on top of
them. We show the classifier to perform with very high precision, while the
learned traffic patterns can be interpreted as Indicators of Compromise. In the
following we implement the discriminative model as a neural network with
special structure reflecting two stacked multi-instance problems. The main
advantages of the proposed configuration include not only improved accuracy and
ability to learn from gross labels, but also automatic learning of server types
(together with their detectors) which are typically visited by infected
computers
Converting Your Thoughts to Texts: Enabling Brain Typing via Deep Feature Learning of EEG Signals
An electroencephalography (EEG) based Brain Computer Interface (BCI) enables
people to communicate with the outside world by interpreting the EEG signals of
their brains to interact with devices such as wheelchairs and intelligent
robots. More specifically, motor imagery EEG (MI-EEG), which reflects a
subjects active intent, is attracting increasing attention for a variety of BCI
applications. Accurate classification of MI-EEG signals while essential for
effective operation of BCI systems, is challenging due to the significant noise
inherent in the signals and the lack of informative correlation between the
signals and brain activities. In this paper, we propose a novel deep neural
network based learning framework that affords perceptive insights into the
relationship between the MI-EEG data and brain activities. We design a joint
convolutional recurrent neural network that simultaneously learns robust
high-level feature presentations through low-dimensional dense embeddings from
raw MI-EEG signals. We also employ an Autoencoder layer to eliminate various
artifacts such as background activities. The proposed approach has been
evaluated extensively on a large- scale public MI-EEG dataset and a limited but
easy-to-deploy dataset collected in our lab. The results show that our approach
outperforms a series of baselines and the competitive state-of-the- art
methods, yielding a classification accuracy of 95.53%. The applicability of our
proposed approach is further demonstrated with a practical BCI system for
typing.Comment: 10 page
Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission
By deploying machine-learning algorithms at the network edge, edge learning
can leverage the enormous real-time data generated by billions of mobile
devices to train AI models, which enable intelligent mobile applications. In
this emerging research area, one key direction is to efficiently utilize radio
resources for wireless data acquisition to minimize the latency of executing a
learning task at an edge server. Along this direction, we consider the specific
problem of retransmission decision in each communication round to ensure both
reliability and quantity of those training data for accelerating model
convergence. To solve the problem, a new retransmission protocol called
data-importance aware automatic-repeat-request (importance ARQ) is proposed.
Unlike the classic ARQ focusing merely on reliability, importance ARQ
selectively retransmits a data sample based on its uncertainty which helps
learning and can be measured using the model under training. Underpinning the
proposed protocol is a derived elegant communication-learning relation between
two corresponding metrics, i.e., signal-to-noise ratio (SNR) and data
uncertainty. This relation facilitates the design of a simple threshold based
policy for importance ARQ. The policy is first derived based on the classic
classifier model of support vector machine (SVM), where the uncertainty of a
data sample is measured by its distance to the decision boundary. The policy is
then extended to the more complex model of convolutional neural networks (CNN)
where data uncertainty is measured by entropy. Extensive experiments have been
conducted for both the SVM and CNN using real datasets with balanced and
imbalanced distributions. Experimental results demonstrate that importance ARQ
effectively copes with channel fading and noise in wireless data acquisition to
achieve faster model convergence than the conventional channel-aware ARQ.Comment: This is an updated version: 1) extension to general classifiers; 2)
consideration of imbalanced classification in the experiments. Submitted to
IEEE Journal for possible publicatio
Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation
We introduce a new loss function for the weakly-supervised training of
semantic image segmentation models based on three guiding principles: to seed
with weak localization cues, to expand objects based on the information about
which classes can occur in an image, and to constrain the segmentations to
coincide with object boundaries. We show experimentally that training a deep
convolutional neural network using the proposed loss function leads to
substantially better segmentations than previous state-of-the-art methods on
the challenging PASCAL VOC 2012 dataset. We furthermore give insight into the
working mechanism of our method by a detailed experimental study that
illustrates how the segmentation quality is affected by each term of the
proposed loss function as well as their combinations.Comment: ECCV 201
- …