Search CORE

5 research outputs found

Not Available

Author: Dipro Sinha
Dwijesh Chandra Mishra
Md Yeasin
Sunil Archak
Tanwy Dasmandal
Publication venue: Not Available
Publication date: 01/10/2022
Field of study

Not AvailableDNA methylation (6mA) is a major epigenetic process by which alteration in gene expression took place without changing the DNA sequence. Predicting these sites in-vitro is laborious, time consuming as well as costly. This 'EpiSemble' package is an in-silico pipeline for predicting DNA sequences containing the 6mA sites. It uses an ensemble-based machine learning approach by combining Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting approach to predict the sequences with 6mA sites in it. This package has been developed by using the concept of Chen et al. (2019)Not Availabl

KRISHI Publications and Data Repository

EpiSemble: A Novel Ensemble-based Machine-learning Framework for Prediction of DNA N6-methyladenine Sites Using Hybrid Features Selection Approach for Crops

Author: Archak Sunil
Dasmandal Tanwy
Mishra Dwijesh C.
Rai Anil
Sinha Dipro
Yeasin Md
Publication venue: University of Memphis Digital Commons
Publication date: 01/01/2023
Field of study

Aim: The study aimed to develop a robust and more precise 6mA methylation prediction tool that assists researchers in studying the epigenetic behaviour of crop plants. Background: N6-methyladenine (6mA) is one of the predominant epigenetic modifications involved in a variety of biological processes in all three kingdoms of life. While in vitro approaches are more precise in detecting epigenetic alterations, they are resource-intensive and time-consuming. Artificial intel-ligence-based in silico methods have helped overcome these bottlenecks. Methods: A novel machine learning framework was developed through the incorporation of four tech-niques: ensemble machine learning, hybrid approach for feature selection, the addition of features, such as Average Mutual Information Profile (AMIP), and bootstrap samples. In this study, four different feature sets, namely di-nucleotide frequency, GC content, AMIP, and nucleotide chemical properties were chosen for the vectorization of DNA sequences. Nine machine learning models, including support vector machine, random forest, k-nearest neighbor, artificial neural network, multiple logistic regression, decision tree, naïve Bayes, AdaBoost, and gradient boosting were employed using relevant features extracted through the feature selection module. The top three best-performing models were selected and a robust ensemble model was developed to predict sequences with 6mA sites. Results: EpiSemble, a novel ensemble model was developed for the prediction of 6mA methylation sites. Using the new model, an improvement in accuracy of 7.0%, 3.74%, and 6.65% was achieved over existing models for RiceChen, RiceLv, and Arabidopsis datasets, respectively. An R package, EpiSem-ble, based on the new model was developed and made available at https://cran.r-project.org/web/packages/EpiSemble/index.html. Conclusion: The EpiSemble model added AMIP as a novel feature, integrated feature selection mod-ules, bootstrapping of samples, and ensemble technique to achieve an improved output for accurate prediction of 6mA sites in plants. To our knowledge, this is the first R package developed for predicting epigenetic sites of genomes in crop plants, which is expected to help plant researchers in their future explorations

University of Memphis Digital Commons

A review of artificial intelligence-assisted omics techniques in plant defense: current trends and future directions

Author: Dipro Sinha
Girish Kumar Jha
Himanshushekhar Chaurasia
Ritwika Das
Sneha Murmu
Soumya Sharma
Sunil Archak
Publication venue: Frontiers Media S.A.
Publication date: 01/03/2024
Field of study

Plants intricately deploy defense systems to counter diverse biotic and abiotic stresses. Omics technologies, spanning genomics, transcriptomics, proteomics, and metabolomics, have revolutionized the exploration of plant defense mechanisms, unraveling molecular intricacies in response to various stressors. However, the complexity and scale of omics data necessitate sophisticated analytical tools for meaningful insights. This review delves into the application of artificial intelligence algorithms, particularly machine learning and deep learning, as promising approaches for deciphering complex omics data in plant defense research. The overview encompasses key omics techniques and addresses the challenges and limitations inherent in current AI-assisted omics approaches. Moreover, it contemplates potential future directions in this dynamic field. In summary, AI-assisted omics techniques present a robust toolkit, enabling a profound understanding of the molecular foundations of plant defense and paving the way for more effective crop protection strategies amidst climate change and emerging diseases

Directory of Open Access Journals

Not Available

Author: Anil Rai
Budhlakoti Neeraj
Kumar Sanjeev
Maji Arpan Kumar
Mishra Dwijesh Chandra
Sharanbasappa
Sharma Anu
Sinha Dipro
Publication venue: Not Available
Publication date: 18/11/2022
Field of study

Not AvailableBackground: One major challenge in binning Metagenomics data is the limited availability of reference datasets, as only 1% of the total microbial population is yet cultured. This has given rise to the efficacy of unsupervised methods for binning in the absence of any reference datasets. Objective: To develop a deep clustering-based binning approach for Metagenomics data and to evaluate results with suitable measures. Methods: In this study, a deep learning-based approach has been taken for binning the Metagenomics data. The results are validated on different datasets by considering features such as Tetra-nucleotide frequency (TNF), Hexa-nucleotide frequency (HNF) and GC-Content. Convolutional Autoencoder is used for feature extraction and for binning; the K-means clustering method is used. Results: In most cases, it has been found that evaluation parameters such as the Silhouette index and Rand index are more than 0.5 and 0.8, respectively, which indicates that the proposed approach is giving satisfactory results. The performance of the developed approach is compared with current methods and tools using benchmarked low complexity simulated and real metagenomic datasets. It is found better for unsupervised and at par with semi-supervised methods. Conclusion: An unsupervised advanced learning-based approach for binning has been proposed, and the developed method shows promising results for various datasets. This is a novel approach for solving the lack of reference data problem of binning in metagenomics.Not Availabl

KRISHI Publications and Data Repository

Not Available

Author: Farooqi Md. Samir
K. K. Chaturvedi
Kumar Sanjeev
Lal Shashi Bhushan
Mishra Dwijesh Chandra
Rai Anil
Sharma Anu
Sinha Dipro
Publication venue: Not Available
Publication date: 29/04/2022
Field of study

Not AvailableBackground: Binning of metagenomic reads is an active area of research, and many unsupervised machine learning-based techniques have been used for taxonomic independent binning of metagenomic reads. Objective: It is important to find the optimum number of the cluster as well as develop an efficient pipeline for deciphering the complexity of the microbial genome. Methods: Applying unsupervised clustering techniques for binning requires finding the optimal number of clusters beforehand and is observed to be a difficult task. This paper describes a novel method, MetaConClust, using coverage information for grouping of contigs and automatically finding the optimal number of clusters for binning of metagenomics data using a consensus-based clustering approach. The coverage of contigs in a metagenomics sample has been observed to be directly proportional to the abundance of species in the sample and is used for grouping of data in the first phase by MetaConClust. The Partitioning Around Medoid (PAM) method is used for clustering in the second phase for generating bins with the initial number of clusters determined automatically through a consensus- based method. Results: Finally, the quality of the obtained bins is tested using silhouette index, rand Index, recall, precision, and accuracy. Performance of MetaConClust is compared with recent methods and tools using benchmarked low complexity simulated and real metagenomic datasets and is found better for unsupervised and comparable for hybrid methods.Not Availabl

KRISHI Publications and Data Repository