Search CORE

71 research outputs found

A Computational Model of Cancer Metabolism for Personalised Medicine

Author: Angione Claudio
Occhipinti Annalisa
Publication venue: 'Cambridge Medicine Journal'
Publication date: 06/03/2021
Field of study

Teeside University's Research Repository

A pipeline and comparative study of 12 machine learning models for text classification

Author: Angione Claudio
Occhipinti Annalisa
Rogers Louis
Publication venue
Publication date: 04/04/2022
Field of study

Text-based communication is highly favoured as a communication method, especially in business environments. As a result, it is often abused by sending malicious messages, e.g., spam emails, to deceive users into relaying personal information, including online accounts credentials or banking details. For this reason, many machine learning methods for text classification have been proposed and incorporated into the services of most email providers. However, optimising text classification algorithms and finding the right tradeoff on their aggressiveness is still a major research problem. We present an updated survey of 12 machine learning text classifiers applied to a public spam corpus. A new pipeline is proposed to optimise hyperparameter selection and improve the models' performance by applying specific methods (based on natural language processing) in the preprocessing stage. Our study aims to provide a new methodology to investigate and optimise the effect of different feature sizes and hyperparameters in machine learning classifiers that are widely used in text classification problems. The classifiers are tested and evaluated on different metrics including F-score (accuracy), precision, recall, and run time. By analysing all these aspects, we show how the proposed pipeline can be used to achieve a good accuracy towards spam filtering on the Enron dataset, a widely used public email corpus. Statistical tests and explainability techniques are applied to provide a robust analysis of the proposed pipeline and interpret the classification outcomes of the 12 machine learning models, also identifying words that drive the classification results. Our analysis shows that it is possible to identify an effective machine learning model to classify the Enron dataset with an F-score of 94%.Comment: This article has been accepted for publication in Expert Systems with Applications, April 2022. Published by Elsevier. All data, models, and code used in this work are available on GitHub at https://github.com/Angione-Lab/12-machine-learning-models-for-text-classificatio

arXiv.org e-Print Archive

Teeside University's Research Repository

A pipeline and comparative study of 12 machine learning models for text classification

Author: Angione Claudio
Occhipinti Annalisa
Rogers Louis
Publication venue
Publication date: 06/04/2022
Field of study

Teeside University's Research Repository

Machine Learning Methods for Survival Analysis with Clinical and Transcriptomics Data of Breast Cancer

Author: Angione Claudio
Doan Le Minh Thao
Occhipinti Annalisa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/05/2022
Field of study

Teeside University's Research Repository

Surrogate models for seismic and pushover response prediction of steel special moment resisting frames

Author: Dawood Nashwan
Muhit Imrose
Occhipinti Annalisa
Samadian Delbaz
Publication venue
Publication date: 01/09/2024
Field of study

For structural engineers, existing surrogate models of buildings present challenges due to inadequate datasets, exclusion of significant input variables impacting nonlinear building response, and failure to consider uncertainties associated with input parameters. Moreover, there are no surrogate models for the prediction of both pushover and nonlinear time history analysis (NLTHA) outputs. To overcome these challenges, the present study proposes a novel framework for surrogate modelling of steel structures, considering crucial structural factors impacting engineering demand parameters (EDPs). The first phase involves the development of a process by which 30,000 random steel special moment resisting frames (SMRFs) for low to high-rise buildings are generated, considering the material and geometrical uncertainties embedded in the design of structures. In the second phase, a surrogate model is developed to predict the seismic EDPs of SMRFs when exposed to various earthquake levels. This is accomplished by leveraging the results obtained from phase one. Moreover, separate surrogate models are developed for the prediction of SMRFs’ essential pushover parameters. Various machine learning (ML) methods are examined, and the outcomes are presented as user-friendly GUI tools. The findings highlighted the substantial influence of pushover parameters as well as beams and columns’ plastic hinges properties on the prediction of NLTHA, factors that have been overlooked in prior studies. Moreover, CatBoost has been acknowledged as the superior ML technique for predicting both pushover and NLTHA parameters for all buildings. This framework offers engineers the ability to estimate building responses without the necessity of conducting NLTHA, pushover, or even modal analysis which is computationally intensive

Teeside University's Research Repository

Surrogate models for seismic and pushover response prediction of steel special moment resisting frames

Author: Dawood Nashwan
Muhit Imrose
Occhipinti Annalisa
Samadian Delbaz
Publication venue
Publication date: 01/09/2024
Field of study

Teeside University's Research Repository

A Practical Guide to Integrating Multimodal Machine Learning and Metabolic Modeling

Author: Angione Claudio
Magazzu Giuseppe
Moon Pradip
Occhipinti Annalisa
Vijayakumar Supreeta
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/05/2022
Field of study

Complex, distributed, and dynamic sets of clinical biomedical data are collectively referred to as multimodal clinical data. In order to accommodate the volume and heterogeneity of such diverse data types and aid in their interpretation when they are combined with a multi-scale predictive model, machine learning is a useful tool that can be wielded to deconstruct biological complexity and extract relevant outputs. Additionally, genome-scale metabolic models (GSMMs) are one of the main frameworks striving to bridge the gap between genotype and phenotype by incorporating prior biological knowledge into mechanistic models. Consequently, the utilization of GSMMs as a foundation for the integration of multi-omic data originating from different domains is a valuable pursuit towards refining predictions. In this chapter, we show how cancer multi-omic data can be analyzed via multimodal machine learning and metabolic modeling. Firstly, we focus on the merits of adopting an integrative systems biology led approach to biomedical data mining. Following this, we propose how constraint-based metabolic models can provide a stable yet adaptable foundation for the integration of multimodal data with machine learning. Finally, we provide a step-by-step tutorial for the combination of machine learning and GSMMs, which includes: (i) tissue-specific constraint-based modeling; (ii) survival analysis using time-to-event prediction for cancer; and (iii) classification and regression approaches for multimodal machine learning. The code associated with the tutorial can be found at https://github.com/Angione-Lab/Tutorials_Combining_ML_and_GSMM

Teeside University's Research Repository

Lancaster E-Prints

Cosmonet:An r package for survival analysis using screening-network methods

Author: Angelini Claudia
De Feis Italia
Iuliano Antonella
Liò Pietro
Occhipinti Annalisa
Publication venue: 'MDPI AG'
Publication date: 15/12/2021
Field of study

Teeside University's Research Repository

Cancer Markers Selection Using Network-Based Cox Regression: A Methodological and Computational Practice.

Author: Angelini Claudia
De Feis Italia
Iuliano Antonella
Lió Pietro
Occhipinti Annalisa
Publication venue: Front Physiol
Publication date: 01/01/2016
Field of study

International initiatives such as the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) are collecting multiple datasets at different genome-scales with the aim of identifying novel cancer biomarkers and predicting survival of patients. To analyze such data, several statistical methods have been applied, among them Cox regression models. Although these models provide a good statistical framework to analyze omic data, there is still a lack of studies that illustrate advantages and drawbacks in integrating biological information and selecting groups of biomarkers. In fact, classical Cox regression algorithms focus on the selection of a single biomarker, without taking into account the strong correlation between genes. Even though network-based Cox regression algorithms overcome such drawbacks, such network-based approaches are less widely used within the life science community. In this article, we aim to provide a clear methodological framework on the use of such approaches in order to turn cancer research results into clinical applications. Therefore, we first discuss the rationale and the practical usage of three recently proposed network-based Cox regression algorithms (i.e., Net-Cox, AdaLnet, and fastcox). Then, we show how to combine existing biological knowledge and available data with such algorithms to identify networks of cancer biomarkers and to estimate survival of patients. Finally, we describe in detail a new permutation-based approach to better validate the significance of the selection in terms of cancer gene signatures and pathway/networks identification. We illustrate the proposed methodology by means of both simulations and real case studies. Overall, the aim of our work is two-fold. Firstly, to show how network-based Cox regression models can be used to integrate biological knowledge (e.g., multi-omics data) for the analysis of survival data. Secondly, to provide a clear methodological and computational approach for investigating cancers regulatory networks

Crossref

Frontiers - Publisher Connector

PubMed Central

Teeside University's Research Repository

Apollo (Cambridge)

Discovering Essential Multiple Gene Effects through Large Scale Optimization: an Application to Human Cancer Metabolism

Author: Angione Claudio
Hamadi Youssef
Kugler Hillel
Occhipinti Annalisa
Wintersteiger Christoph
Yordanov Boyan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/04/2020
Field of study

Teeside University's Research Repository