Search CORE

5,601 research outputs found

Augmented nomogram with dependent feature pairs

Author: FU QIANG
Publication venue
Publication date: 12/04/2012
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

How complex is the microarray dataset? A novel data complexity metric for biological high-dimensional microarray data

Author: Chen Yuanzhu
Hu Ting
Jiang Zijun
Sha Zhendong
Zhu Li
Publication venue
Publication date: 11/08/2023
Field of study

Data complexity analysis quantifies the hardness of constructing a predictive model on a given dataset. However, the effectiveness of existing data complexity measures can be challenged by the existence of irrelevant features and feature interactions in biological micro-array data. We propose a novel data complexity measure, depth, that leverages an evolutionary inspired feature selection algorithm to quantify the complexity of micro-array data. By examining feature subsets of varying sizes, the approach offers a novel perspective on data complexity analysis. Unlike traditional metrics, depth is robust to irrelevant features and effectively captures complexity stemming from feature interactions. On synthetic micro-array data, depth outperforms existing methods in robustness to irrelevant features and identifying complexity from feature interactions. Applied to case-control genotype and gene-expression micro-array datasets, the results reveal that a single feature of gene-expression data can account for over 90% of the performance of multi-feature model, confirming the adequacy of the commonly used differentially expressed gene (DEG) feature selection method for the gene expression data. Our study also demonstrates that constructing predictive models for genotype data is harder than gene expression data. The results in this paper provide evidence for the use of interpretable machine learning algorithms on microarray data

arXiv.org e-Print Archive

A Random Forest model for predicting allosteric and functional sites on proteins

Author: Ballester
Bento
Boehr
Bondi
Breiman
Breiman
Cheng
Chiu
Cuff
Del Sol
Demerdash
Erman
Greener
Gunasekaran
Hardy
Huang
Huang
Kaya
Kirtay
Kumar
Kuntz
Laskowski
Laskowski
Lockless
Long
Lu
Malmendal
Novinec
Panjkovich
Panjkovich
Raileanu
Richards
Sancar
Stanger
Steinbeck
Svetnik
Svetnik
Tsai
Tsai
van Westen
Volkman
Wang
Xu
Publication venue: 'Wiley'
Publication date: 05/04/2016
Field of study

We thank the Scottish Universities Life Sciences Alliance (SULSA) for funding to JBOM and for PB’s PhD studentship under NJW’s supervision.We created a computational method to identify allosteric sites using a machine learning method trained and tested on protein structures containing bound ligand molecules. The Random Forest machine learning approach was adopted to build our three-way predictive model. Based on descriptors collated for each ligand and binding site, the classification model allows us to assign protein cavities as allosteric, regular or orthosteric, and hence to identify allosteric sites. 43 structural descriptors per complex were derived and were used to characterize individual protein-ligand binding sites belonging to the three classes, allosteric, regular and orthosteric. We carried out a separate validation on a further unseen set of protein structures containing the ligand 2-(N-cyclohexylamino) ethane sulfonic acid (CHES).PostprintPeer reviewe

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Visualizing Google Analytics’, Digital Marketing Campaigns’ and ERP system’s data using BI tools

Author: Kagiampaslidis Michail - Christos
Publication venue
Publication date: 09/06/2020
Field of study

International Hellenic University: IHU Open Access Repository

Using machine learning to predict smartphone usage

Author: Kankaanranta J. (Jyrki)
Publication venue: University of Oulu
Publication date: 10/01/2023
Field of study

Abstract. This thesis shows the process of creating and analyzing a machine-learning model. It goes over prevalent classification algorithms and their advantages and disadvantages. Furthermore, techniques and metrics used to evaluate the performance of the model are introduced. In the latter part of the thesis, a Random Forest model is implemented. The objective was to predict the participants’ smartphone usage, more specifically the category of an application they had opened. This starts with a pre-processing phase, where relevant information is extracted from the raw data. Multiple variations of the model are built, and the best-performing model was able to achieve 63.37% accuracy. Additionally, the features are scored to provide more insight into the model. The thesis ends with a brief discussion section, which contemplates the reasons behind the results, some of the model’s deficiencies and how it could be improved

University of Oulu Repository - Jultika

Reaction Prediction: The Case of Tweets from Luxury Fashion Brands

Author: Calviello Crusella Chiara
Publication venue: Universidad Torcuato Di Tella
Publication date: 01/01/2023
Field of study

Social media platforms represent an essential tool for both consumers and marketers. Meanwhile, luxury fashion brands play a key role in fashion, one of the most important industries of the world economy. Despite assumptions to the contrary, social media platforms and luxury fashion brands do mix, especially in the recent time. Consequently, it is worth asking whether it is possible to predict the reaction a post will generate in the audience of luxury fashion brands. This new question is the one this thesis intends to answer. To do so, the concept of reaction is defined through a novel composite index that is created and named Tweet reaction overall score (TROS), which is one of the solid and relevant contributions this thesis makes. Then, several predictive models are implemented, based on a wide range of different learning algorithms. The results show that it is indeed possible to predict the TROS that a post on Twitter will obtain in the audience of luxury fashion brands the day it is posted

Repositorio Digital Universidad Torcuato Di Tella

Analysis of a short on-line course through logged data recording by a self-developed logging module

Author: Esztelecki Péter
Kőrösi Gábor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications