15 research outputs found
Performance Comparison of Ensemble Learning and Supervised Algorithms in Classifying Multi-label Network Traffic Flow
A research article was published by Engineering, Technology & Applied Science Research (ETASR) Volume: 12 | Issue: 3 |June 2022Network traffic classification is of significant importance. It helps identify network anomalies and assists in taking measures to avoid them. However, classifying network traffic correctly is a challenging task. This study aims to compare ensemble learning methods with normal supervised classification to come up with improved classification methods. Three types of network traffic were classified (Benign, Malicious, and Outliers). The data were collected experimentally by using Paessler Router Traffic Grapher software and online and were analyzed by R software. The datasets were used to train five supervised models (k-nearest neighbors, mixture discriminant analysis, Naïve Bayes, C5.0 classification model, and regularized discriminant analysis). The models were trained by 70% of the samples and the rest 30% were used for validation. The same samples were used separately in predicting individual accuracy. The results were compared to the ensemble learning models which were built with the use of the same datasets. Among the five supervised classifiers, k-nearest neighbors and C5.0 classification scored the highest accuracy of 0.868 and 0.761. The ensemble learning classifiers Bagging (Random Forest) and Boosting (eXtreme Gradient Boosting) had accuracy of 0.904 and 0.902 respectively. The results show that the ensemble learning method has higher accuracy compared to the normal supervised classifiers. Therefore, it can be used to detect malicious activities in network traffic as well as anomalies with improved accuracy
The Effect of Hyperparameter Optimization on the Estimation of Performance Metrics in Network Traffic Prediction using the Gradient Boosting Machine Model
Information and Communication Technology (ICT) has changed the way we communicate and access information, resulting in the high generation of heterogeneous data. The amount of network traffic generated constantly increases in velocity, veracity, and volume as we enter the era of big data. Network traffic classification and intrusion detection are very important for the early detection and identification of unnecessary network traffic. The Machine Learning (ML) approach has recently entered the center stage in network traffic accurate classification. However, in most cases, it does not apply model hyperparameter optimization. In this study, gradient boosting machine prediction was used with different hyperparameter optimization configurations, such as interaction depth, tree number, learning rate, and sampling. Data were collected through an experimental setup by using the Sophos firewall and Cisco router data loggers. Data analysis was conducted with R software version 4.2.0 with Rstudio Integrated Development Environment. The dataset was split into two partitions, where 70% was used for training the model and 30% for testing. At a learning rate of 0.1, interaction depth of 14, and tree number of 2500, the model estimated the highest performance metrics with an accuracy of 0.93 and R of 0.87 compared to 0.90 and 0.85 before model optimization. The same configuration attained the minimum classification error of 0.07 than 0.10 before model optimization. After model tweaking, a method was developed for achieving improved accuracy, R square, mean decrease in Gini coefficients for more than 8 features, lower classification error, root mean square error, logarithmic loss, and mean square error in the model
Informing aerial total counts with demographic models: population growth of Serengeti elephants not explained purely by demography
Conservation management is strongly shaped by the interpretation of population trends. In the Serengeti ecosystem, Tanzania, aerial total counts indicate a striking increase in elephant abundance compared to all previous censuses. We developed a simple age-structured population model to guide interpretation of this reported increase, focusing on three possible causes: (1) in situ population growth, (2) immigration from Kenya, and (3) differences in counting methodologies over time. No single cause, nor the combination of two causes, adequately explained the observed population growth. Under the assumptions of maximum in situ growth and detection bias of 12.7% in previous censuses, conservative estimates of immigration from Kenya were between 250 and 1,450 individuals. Our results highlight the value of considering demography when drawing conclusions about the causes of population trends. The issues we illustrate apply to other species that have undergone dramatic changes in abundance, as well as many elephant populations
Recommended from our members
Assessing rotation-invariant feature classification for automated wildebeest population counts
Accurate and on-demand animal population counts are the holy grail for wildlife conservation organizations throughout the world because they enable fast and responsive adaptive management policies. While the collection of image data from camera traps, satellites, and manned or unmanned aircraft has advanced significantly, the detection and identification of animals within images remains a major bottleneck since counting is primarily conducted by dedicated enumerators or citizen scientists. Recent developments in the field of computer vision suggest a potential resolution to this issue through the use of rotation-invariant object descriptors combined with machine learning algorithms. Here we implement an algorithm to detect and count wildebeest from aerial images collected in the Serengeti National Park in 2009 as part of the biennial wildebeest count. We find that the per image error rates are greater than, but comparable to, two separate human counts. For the total count, the algorithm is more accurate than both manual counts, suggesting that human counters have a tendency to systematically over or under count images. While the accuracy of the algorithm is not yet at an acceptable level for fully automatic counts, our results show this method is a promising avenue for further research and we highlight specific areas where future research should focus in order to develop fast and accurate enumeration of aerial count data. If combined with a bespoke image collection protocol, this approach may yield a fully automated wildebeest count in the near future
CONFLICTS OVER LAND AND WATER RESOURCES IN THE KILOMBERO VALLEY FLOODPLAIN, TANZANIA
The Kilombero Valley floodplain (KVFP) inhabits a very large natural wetland of which over 70% is protected. Diverse mammals, amphibians, fish and bird species populate the area. Importantly, KVFP harbours 75% of the world Puku antelope population. Most human activities in the area include large and small scale farming, pastoralism and fishing. Recently, population pressure, overgrazing and aligned human activities have pressed strain on the land and water resources in the KVFP. The situation prompted the government of Tanzania to resettle some of the pastoral families so as to achieve sustainable natural resources management. The paper provides an insight of this resettlement exercise as a multilayered land use conflict and its effects to the land resources and people's livelihoods. Focused group discussions, key informant interviews both using checklists and literature review were the methods used for data collection. The Sukuma agro-pastoralists, Maasai and Barbaig pastoralists were the most ethnic groups affected by the resettlement exercise. It was envisaged that a pragmatic approach to land and water resources management such as effective land use plans, natural resource monitoring plans, sensitization programs and rule of law are needed to avoid future conflicts over land resources use and to ensure peoplecentered development process is achieved
Fire regulates the abundance of alien plant species around roads and settlements in the Serengeti National Park
A large portion of East African ecosystems are officially protected, yet increasing wildlife tourism and infrastructural development are exposing these areas to invasion by alien plant species. To date there has been little quantification of alien plant species in the Serengeti National Park, Tanzania. In this study, we aimed to: (1) establish a list of common alien plant species; (2) quantify the frequencies of alien species near roads and settlements (i.e. tourist lodges and a campsite), and (3) estimate the abundance (plant cover) of alien plant species in relation to source activities (i.e. gardening) and park management (i.e. fire). In total, we detected 15 alien plant species in our surveys with an 80% probability of encountering an alien species within the first 50 m from a road or settlement. Overall, we found no difference in the presence of alien species near roads or settlements, but did find a significant decline in species presence with distance from these sources. Cumulative fire frequency was the most important factor influencing the abundance of alien species with the highest alien plant cover in areas with infrequent or no fires over the last 13 years. There was no difference in alien plant cover in relation to other commonly cited source activities, which may be due to the stronger influence of fire. Although the abundance of the majority of alien plant species was negatively related to fire, some species, notably Tagetes minuta, had higher cover with more frequent fires. Our results contradict findings from other African savannahs that suggest fire promotes invasive species and this is likely due to the species-specific interactions with fire. In the Serengeti, fire will be difficult to use as a management tool due to variable species response. Thus, we highlight that other management approaches such as physical removal and biological control agents can be implemented; however future work with these techniques should also consider the interaction of alien plant species with fire
A comparison of deep learning and citizen science techniques for counting wildlife in aerial survey images
Files contain wildebeest counts for images taken during the 2015 wildebeest survey
- expert_counts: counting performed by DJLJ
- raw_zooniverse_counts: all counts from zooniverse volunteers (usernames redacted)
- yolo_counts: counting performed by deep learning object detection algorith
Estimating the abundance of a group-living species using multi-latent spatial models
Statistical models use observations of animals to make inferences about the abundance and distribution of species. However, the spatial distribution of animals is a complex function of many factors, including landscape and environmental features, and intra- and interspecific interactions. Modelling approaches often have to make significant simplifying assumptions about these factors, which can result in poor model performance and inaccurate predictions.
Here, we explore the implications of complex spatial structure for modelling the abundance of the Serengeti wildebeest, a gregarious migratory species. The social behaviour of wildebeest leads to a highly aggregated distribution, and we examine the consequences of omitting this spatial complexity when modelling species abundance. To account for this distribution, we introduce a multi-latent framework that uses two random fields to capture the clustered distribution of wildebeest.
Our results show that simplifying assumptions that are often made in spatial models can dramatically impair performance. However, by allowing for mixtures of spatial models accurate predictions can be made. Furthermore, there can be a non-monotonic relationship between model complexity and model performance; complex, flexible models that rely on unfounded assumptions can potentially make highly inaccurate predictions, whereas simpler more traditional approaches involve fewer assumptions and are less sensitive to these issues.
We demonstrate how to develop flexible spatial models that can accommodate the complex processes driving animal distributions. Our findings highlight the importance of robust model checking protocols, and we illustrate how realistic assumptions can be incorporated into models using random fields