Search CORE

15 research outputs found

Performance Comparison of Ensemble Learning and Supervised Algorithms in Classifying Multi-label Network Traffic Flow

Author: Agbinya Johnson
Machoke Mwita
Mbelwa Jimmy
Sam Anael
Publication venue: Engineering, Technology & Applied Science Research
Publication date: 01/06/2022
Field of study

A research article was published by Engineering, Technology & Applied Science Research (ETASR) Volume: 12 | Issue: 3 |June 2022Network traffic classification is of significant importance. It helps identify network anomalies and assists in taking measures to avoid them. However, classifying network traffic correctly is a challenging task. This study aims to compare ensemble learning methods with normal supervised classification to come up with improved classification methods. Three types of network traffic were classified (Benign, Malicious, and Outliers). The data were collected experimentally by using Paessler Router Traffic Grapher software and online and were analyzed by R software. The datasets were used to train five supervised models (k-nearest neighbors, mixture discriminant analysis, Naïve Bayes, C5.0 classification model, and regularized discriminant analysis). The models were trained by 70% of the samples and the rest 30% were used for validation. The same samples were used separately in predicting individual accuracy. The results were compared to the ensemble learning models which were built with the use of the same datasets. Among the five supervised classifiers, k-nearest neighbors and C5.0 classification scored the highest accuracy of 0.868 and 0.761. The ensemble learning classifiers Bagging (Random Forest) and Boosting (eXtreme Gradient Boosting) had accuracy of 0.904 and 0.902 respectively. The results show that the ensemble learning method has higher accuracy compared to the normal supervised classifiers. Therefore, it can be used to detect malicious activities in network traffic as well as anomalies with improved accuracy

NM-AIST Repository

The Effect of Hyperparameter Optimization on the Estimation of Performance Metrics in Network Traffic Prediction using the Gradient Boosting Machine Model

Author: Anael Elikana Sam
Jimmy Mbelwa
Johnson Agbinya
Machoke Mwita
Publication venue: Engineering, Technology & Applied Science Research
Publication date: 01/06/2023
Field of study

Information and Communication Technology (ICT) has changed the way we communicate and access information, resulting in the high generation of heterogeneous data. The amount of network traffic generated constantly increases in velocity, veracity, and volume as we enter the era of big data. Network traffic classification and intrusion detection are very important for the early detection and identification of unnecessary network traffic. The Machine Learning (ML) approach has recently entered the center stage in network traffic accurate classification. However, in most cases, it does not apply model hyperparameter optimization. In this study, gradient boosting machine prediction was used with different hyperparameter optimization configurations, such as interaction depth, tree number, learning rate, and sampling. Data were collected through an experimental setup by using the Sophos firewall and Cisco router data loggers. Data analysis was conducted with R software version 4.2.0 with Rstudio Integrated Development Environment. The dataset was split into two partitions, where 70% was used for training the model and 30% for testing. At a learning rate of 0.1, interaction depth of 14, and tree number of 2500, the model estimated the highest performance metrics with an accuracy of 0.93 and R of 0.87 compared to 0.90 and 0.85 before model optimization. The same configuration attained the minimum classification error of 0.07 than 0.10 before model optimization. After model tweaking, a method was developed for achieving improved accuracy, R square, mean decrease in Gini coefficients for more than 8 features, lower classification error, root mean square error, logarithmic loss, and mean square error in the model

Directory of Open Access Journals

NM-AIST Repository

Informing aerial total counts with demographic models: population growth of Serengeti elephants not explained purely by demography

Author: Estes Anna B.
Frederick Howard
Kija Hamza
Kohi Edward M.
Maliti Honori T.
Mduma Simon A.R.
Morrison Thomas A.
Mwita Machoke
Sinclair A.R.E.
Publication venue: 'Wiley'
Publication date: 10/10/2017
Field of study

Conservation management is strongly shaped by the interpretation of population trends. In the Serengeti ecosystem, Tanzania, aerial total counts indicate a striking increase in elephant abundance compared to all previous censuses. We developed a simple age-structured population model to guide interpretation of this reported increase, focusing on three possible causes: (1) in situ population growth, (2) immigration from Kenya, and (3) differences in counting methodologies over time. No single cause, nor the combination of two causes, adequately explained the observed population growth. Under the assumptions of maximum in situ growth and detection bias of 12.7% in previous censuses, conservative estimates of immigration from Kenya were between 250 and 1,450 individuals. Our results highlight the value of considering demography when drawing conclusions about the causes of population trends. The issues we illustrate apply to other species that have undergone dramatic changes in abundance, as well as many elephant populations

Crossref

Enlighten

Recommended from our members

Assessing rotation-invariant feature classification for automated wildebeest population counts

Author: Borner Felix
Borner Markus
Dobson Andrew P.
Fredrick Howard
Hopcraft J. Grant C.
Lloyd-Jones David J.
Maliti Honori T.
Moyer David
Mwita Machoke
Torney Colin J.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/05/2016
Field of study

Accurate and on-demand animal population counts are the holy grail for wildlife conservation organizations throughout the world because they enable fast and responsive adaptive management policies. While the collection of image data from camera traps, satellites, and manned or unmanned aircraft has advanced significantly, the detection and identification of animals within images remains a major bottleneck since counting is primarily conducted by dedicated enumerators or citizen scientists. Recent developments in the field of computer vision suggest a potential resolution to this issue through the use of rotation-invariant object descriptors combined with machine learning algorithms. Here we implement an algorithm to detect and count wildebeest from aerial images collected in the Serengeti National Park in 2009 as part of the biennial wildebeest count. We find that the per image error rates are greater than, but comparable to, two separate human counts. For the total count, the algorithm is more accurate than both manual counts, suggesting that human counters have a tendency to systematically over or under count images. While the accuracy of the algorithm is not yet at an acceptable level for fully automatic counts, our results show this method is a promising avenue for further research and we highlight specific areas where future research should focus in order to develop fast and accurate enumeration of aerial count data. If combined with a bespoke image collection protocol, this approach may yield a fully automated wildebeest count in the near future

Princeton University Open Access Repository

Crossref

Directory of Open Access Journals

PubMed Central

Enlighten

FigShare

CONFLICTS OVER LAND AND WATER RESOURCES IN THE KILOMBERO VALLEY FLOODPLAIN, TANZANIA

Author: Bakari Samwel
Kija Hamza
Machoke Mwita
Maliti Hanori
Nindi Stephen Justice
Publication venue: The Research Committee for African Area Studies, Kyoto University
Publication date: 01/10/2014
Field of study

The Kilombero Valley floodplain (KVFP) inhabits a very large natural wetland of which over 70% is protected. Diverse mammals, amphibians, fish and bird species populate the area. Importantly, KVFP harbours 75% of the world Puku antelope population. Most human activities in the area include large and small scale farming, pastoralism and fishing. Recently, population pressure, overgrazing and aligned human activities have pressed strain on the land and water resources in the KVFP. The situation prompted the government of Tanzania to resettle some of the pastoral families so as to achieve sustainable natural resources management. The paper provides an insight of this resettlement exercise as a multilayered land use conflict and its effects to the land resources and people's livelihoods. Focused group discussions, key informant interviews both using checklists and literature review were the methods used for data collection. The Sukuma agro-pastoralists, Maasai and Barbaig pastoralists were the most ethnic groups affected by the resettlement exercise. It was envisaged that a pragmatic approach to land and water resources management such as effective land use plans, natural resource monitoring plans, sensitization programs and rule of law are needed to avoid future conflicts over land resources use and to ensure peoplecentered development process is achieved

Kyoto University Research Information Repository

CONFLICTS OVER LAND AND WATER RESOURCES IN THE KILOMBERO VALLEY FLOODPLAIN, TANZANIA

Author: Bakari Samwel
Kija Hamza
Machoke Mwita
Maliti Hanori
Nindi Stephen Justice
Publication venue: The Research Committee for African Area Studies, Kyoto University
Publication date
Field of study

Institutional Repositories DataBase (IRDB)

Fire regulates the abundance of alien plant species around roads and settlements in the Serengeti National Park

Author: Bukombe John
Kihwele Emilian
Kija Hamza
Loishooki Asheeli
Mwakalebe Grayson
Mwita Machoke
Smith Stuart
Sumay Glory
Publication venue: Regional Euro-Asian Biological Invasions Centre (REABIC)
Publication date: 01/01/2018
Field of study

A large portion of East African ecosystems are officially protected, yet increasing wildlife tourism and infrastructural development are exposing these areas to invasion by alien plant species. To date there has been little quantification of alien plant species in the Serengeti National Park, Tanzania. In this study, we aimed to: (1) establish a list of common alien plant species; (2) quantify the frequencies of alien species near roads and settlements (i.e. tourist lodges and a campsite), and (3) estimate the abundance (plant cover) of alien plant species in relation to source activities (i.e. gardening) and park management (i.e. fire). In total, we detected 15 alien plant species in our surveys with an 80% probability of encountering an alien species within the first 50 m from a road or settlement. Overall, we found no difference in the presence of alien species near roads or settlements, but did find a significant decline in species presence with distance from these sources. Cumulative fire frequency was the most important factor influencing the abundance of alien species with the highest alien plant cover in areas with infrequent or no fires over the last 13 years. There was no difference in alien plant cover in relation to other commonly cited source activities, which may be due to the stronger influence of fire. Although the abundance of the majority of alien plant species was negatively related to fire, some species, notably Tagetes minuta, had higher cover with more frequent fires. Our results contradict findings from other African savannahs that suggest fire promotes invasive species and this is likely due to the species-specific interactions with fire. In the Serengeti, fire will be difficult to use as a management tool due to variable species response. Thus, we highlight that other management approaches such as physical removal and biological control agents can be implemented; however future work with these techniques should also consider the interaction of alien plant species with fire

University of Brighton Research Portal

NORA - Norwegian Open Research Archives

Fire regulates the abundance of alien plant species around roads and settlements in the Serengeti National Park

Author: Bukombe John
Kihwele Emilian
Kija Hamza K.
Loishooki Asheeli
Mwakalebe Grayson
Mwita Machoke
Smith Stuart W.
Sumay G Mario
Publication venue
Publication date: 31/08/2018
Field of study

University of Brighton Research Portal

A comparison of deep learning and citizen science techniques for counting wildlife in aerial survey images

Author: Chevallier Mark
Hopcraft Grant
Kohi Edward M.
Lloyd-Jones David J.
Maliti Honori T.
Moyer David C.
Mwita Machoke
Torney Colin
Publication venue: University of Glasgow
Publication date: 25/01/2019
Field of study

Files contain wildebeest counts for images taken during the 2015 wildebeest survey - expert_counts: counting performed by DJLJ - raw_zooniverse_counts: all counts from zooniverse volunteers (usernames redacted) - yolo_counts: counting performed by deep learning object detection algorith

Enlighten: Research Data (University of Glasgow)

Crossref

Enlighten

Estimating the abundance of a group-living species using multi-latent spatial models

Author: Frederick Howard L.
Hopcraft J. Grant C.
Kohi Edward M.
Laxton Megan
Lloyd‐Jones David J.
Moyer David C.
Mrisha Chediel
Mwita Machoke
Torney Colin J.
Publication venue: 'Wiley'
Publication date: 03/08/2022
Field of study

Statistical models use observations of animals to make inferences about the abundance and distribution of species. However, the spatial distribution of animals is a complex function of many factors, including landscape and environmental features, and intra- and interspecific interactions. Modelling approaches often have to make significant simplifying assumptions about these factors, which can result in poor model performance and inaccurate predictions. Here, we explore the implications of complex spatial structure for modelling the abundance of the Serengeti wildebeest, a gregarious migratory species. The social behaviour of wildebeest leads to a highly aggregated distribution, and we examine the consequences of omitting this spatial complexity when modelling species abundance. To account for this distribution, we introduce a multi-latent framework that uses two random fields to capture the clustered distribution of wildebeest. Our results show that simplifying assumptions that are often made in spatial models can dramatically impair performance. However, by allowing for mixtures of spatial models accurate predictions can be made. Furthermore, there can be a non-monotonic relationship between model complexity and model performance; complex, flexible models that rely on unfounded assumptions can potentially make highly inaccurate predictions, whereas simpler more traditional approaches involve fewer assumptions and are less sensitive to these issues. We demonstrate how to develop flexible spatial models that can accommodate the complex processes driving animal distributions. Our findings highlight the importance of robust model checking protocols, and we illustrate how realistic assumptions can be incorporated into models using random fields

Directory of Open Access Journals

Enlighten