16 research outputs found
The Emerging Threat of Ai-driven Cyber Attacks: A Review
Cyberattacks are becoming more sophisticated and ubiquitous. Cybercriminals are inevitably adopting Artificial Intelligence (AI) techniques to evade the cyberspace and cause greater damages without being noticed. Researchers in cybersecurity domain have not researched the concept behind AI-powered cyberattacks enough to understand the level of sophistication this type of attack possesses. This paper aims to investigate the emerging threat of AI-powered cyberattacks and provide insights into malicious used of AI in cyberattacks. The study was performed through a three-step process by selecting only articles based on quality, exclusion, and inclusion criteria that focus on AI-driven cyberattacks. Searches in ACM, arXiv Blackhat, Scopus, Springer, MDPI, IEEE Xplore and other sources were executed to retrieve relevant articles. Out of the 936 papers that met our search criteria, a total of 46 articles were finally selected for this study. The result shows that 56% of the AI-Driven cyberattack technique identified was demonstrated in the access and penetration phase, 12% was demonstrated in exploitation, and command and control phase, respectively; 11% was demonstrated in the reconnaissance phase; 9% was demonstrated in the delivery phase of the cybersecurity kill chain. The findings in this study shows that existing cyber defence infrastructures will become inadequate to address the increasing speed, and complex decision logic of AI-driven attacks. Hence, organizations need to invest in AI cybersecurity infrastructures to combat these emerging threats.publishedVersio
Machine learning approaches to genome-wide association studies
Genome-wide Association Studies (GWAS) are conducted to identify single nucleotide polymorphisms
(variants) associated with a phenotype within a specific population. These variants associated with diseases
have a complex molecular aetiology with which they cause the disease phenotype. The genotyping
data generated from subjects of study is of high dimensionality, which is a challenge. The problem is that
the dataset has a large number of features and a relatively smaller sample size. However, statistical testing
is the standard approach being applied to identify these variants that influence the phenotype of
interest. The wide applications and abilities of Machine Learning (ML) algorithms promise to understand
the effects of these variants better. The aim of this work is to discuss the applications and future trends of
ML algorithms in GWAS towards understanding the effects of population genetic variant. It was discovered
that algorithms such as classification, regression, ensemble, and neural networks have been applied
to GWAS for which this work has further discussed comprehensively including their application areas.
The ML algorithms have been applied to the identification of significant single nucleotide polymorphisms
(SNP), disease risk assessment & prediction, detection of epistatic non-linear interaction, and integrated
with other omics sets. This comprehensive review has highlighted these areas of application and sheds
light on the promise of innovating machine learning algorithms into the computational and statistical
pipeline of genome-wide association studies. This will be beneficial for better understanding of how variants
are affected by disease biology and how the same variants can influence risk by developing a particular
phenotype for favourable natural selection
Computational applications in secondary metabolite discovery (caismd): An online workshop
We report the major conclusions of the online open-access workshop “Computational Applications in Secondary
Metabolite Discovery (CAiSMD)” that took place from 08 to 10 March 2021. Invited speakers from academia and
industry and about 200 registered participants from fve continents (Africa, Asia, Europe, South America, and North
America) took part in the workshop. The workshop highlighted the potential applications of computational meth‑
odologies in the search for secondary metabolites (SMs) or natural products (NPs) as potential drugs and drug leads.
During 3 days, the participants of this online workshop received an overview of modern computer-based approaches
for exploring NP discovery in the “omics” age. The invited experts gave keynote lectures, trained participants in handson sessions, and held round table discussions. This was followed by oral presentations with much interaction between
the speakers and the audience. Selected applicants (early-career scientists) were ofered the opportunity to give oral
presentations (15 min) and present posters in the form of fash presentations (5 min) upon submission of an abstract.
The fnal program available on the workshop website (https://caismd.indiayouth.info/) comprised of 4 keynote lec‑
tures (KLs), 12 oral presentations (OPs), 2 round table discussions (RTDs), and 5 hands-on sessions (HSs). This meeting
report also references internet resources for computational biology in the area of secondary metabolites that are of
use outside of the workshop areas and will constitute a long-term valuable source for the community. The workshop
concluded with an online survey form to be completed by speakers and participants for the goal of improving any
subsequent editions
Pseudocode of our Compute_MM Sub-program for <i>MMk-means</i>.
<p>We create a covariance matrix, computing the Pearson product moment correlation coefficient between the k centroids of the previous and current iterations and then deduce k previous and current iterations eigenvalues. The difference of these eigenvalues for each cluster is computed and checked to see if it satisfies the <i>Ding-H</i>e interval.</p
Performance comparison for all types of k-means algorithms considered for very large data sets.
<p>This constitute simulation of three large data sets in the order of; 10,000×50, 30,000×50 and 50,000×50 dimension. The range of K used is 10≤K≤40 for the four algorithms.</p
Execution Time (Bozdech <i>et al.</i>, <i>P.f</i> 3D7 Microarray Dataset).
<p>The plot shows that our MMk-means has the fastest run-time for tested number of clusters, 15≤k≤25. Comparatively, k = 20 took the longest run-time for all the four algorithms, implying that this is a function of the nature of the data under consideration.</p
Non-Biological data used for testing our algorithm and the other three variants of k-means algorithm.
<p>Abalone dataset described with 8 attributes represents physical measurements of abalone (sea organism). Wind dataset described by 12 attributes represents measurements on wind from 1/1/1961 to 31/12/1978. Letter dataset represents the image of English capital letters described by 16 primitive numerical attributes (statistical moments and edge counts).</p
Short statistics on the three microarray experimental data used in the testing of our algorithm and the other three variants of k-means algorithm.
<p>The second and third columns indicate the total number of genes covered in each experiment and the number of points (at equal interval) at which the genes transcriptional expression are measured.</p
Pseudocode of our main program for <i>MMk-means</i>.
<p>It runs similar to the traditional k-means except that it is equipped with a metric matrices based mechanism to determine when a cluster is stable (that is, its members will not move from this cluster in subsequent iteration). This mechanism is implemented in sub-procedure Compute_MM of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0049946#pone-0049946-g001" target="_blank"><i>Figure</i> 1</a>. We use the theory developed by Zha <i>et al. </i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0049946#pone.0049946-Zha1" target="_blank">[20]</a> from the singular values of the matrix X of the input data points to determine when it is appropriate to execute Compute_MM during the k-means iterations. This is implemented in lines 34–40.</p