Search CORE

12 research outputs found

Customer profile classification using transactional data

Author: Apeh Edward Tersoo
Gabrys Bogdan
Schierz Amanda C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2011
Field of study

Customer profiles are by definition made up of factual and transactional data. It is often the case that due to reasons such as high cost of data acquisition and/or protection, only the transactional data are available for data mining operations. Transactional data, however, tend to be highly sparse and skewed due to a large proportion of customers engaging in very few transactions. This can result in a bias in the prediction accuracy of classifiers built using them towards the larger proportion of customers with fewer transactions. This paper investigates an approach for accurately and confidently grouping and classifying customers in bins on the basis of the number of their transactions. The experiments we conducted on a highly sparse and skewed real-world transactional data show that our proposed approach can be used to identify a critical point at which customer profiles can be more confidently distinguished

Crossref

Bournemouth University Research Online

Innovative Hybridisation of Genetic Algorithms and Neural Networks in Detecting Marker Genes for Leukaemia Cancer

Author: Mintram Robert
Phalp Keith T.
Schierz Amanda C.
Tong Dong L.
Publication venue: 'CSRI Elektropribor'
Publication date: 01/09/2009
Field of study

Methods for extracting marker genes that trigger the growth of cancerous cells from a high level of complexity microarrays are of much interest from the computing community. Through the identified genes, the pathology of cancerous cells can be revealed and early precaution can be taken to prevent further proliferation of cancerous cells. In this paper, we propose an innovative hybridised gene identification framework based on genetic algorithms and neural networks to identify marker genes for leukaemia disease. Our approach confirms that high classification accuracy does not ensure the optimal set of genes have been identified and our model delivers a more promising set of genes even with a lower classification accurac

Bournemouth University Research Online

Winners’ notes. Using Multi-Resolution Clustering for Music Genre Identification

Author: Apeh E.
Budka Marcin
Schierz Amanda C.
Publication venue
Publication date: 01/03/2011
Field of study

Article describing a less technical version of our winning entry in the ISMIS 2011 Music Genre competitio

Bournemouth University Research Online

Virtual Screening of Bioassay Data

Author: Amanda C Schierz
AR Leach
B Chen
C Drummond
C Elkan
CA Lipinski
D Bradley
EE Bolton
HL Lo
IH Witten
J Hollmen
JA DiMasi
K Liu
P Domingos
T Eitrich
TM Ehrman
VS Sheng
YW Seo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: There are three main problems associated with the virtual screening of bioassay data. The first is access to freely-available curated data, the second is the number of false positives that occur in the physical primary screening process, and finally the data is highly-imbalanced with a low ratio of Active compounds to Inactive compounds. This paper first discusses these three problems and then a selection of Weka cost-sensitive classifiers (Naive Bayes, SVM, C4.5 and Random Forest) are applied to a variety of bioassay datasets. Results: Pharmaceutical bioassay data is not readily available to the academic community. The data held at PubChem is not curated and there is a lack of detailed cross-referencing between Primary and Confirmatory screening assays. With regard to the number of false positives that occur in the primary screening process, the analysis carried out has been shallow due to the lack of crossreferencing mentioned above. In six cases found, the average percentage of false positives from the High-Throughput Primary screen is quite high at 64%. For the cost-sensitive classification, Weka's implementations of the Support Vector Machine and C4.5 decision tree learner have performed relatively well. It was also found, that the setting of the Weka cost matrix is dependent on the base classifier used and not solely on the ratio of class imbalance. Conclusions: Understandably, pharmaceutical data is hard to obtain. However, it would be beneficial to both the pharmaceutical industry and to academics for curated primary screening and corresponding confirmatory data to be provided. Two benefits could be gained by employing virtual screening techniques to bioassay data. First, by reducing the search space of compounds to be screened and secondly, by analysing the false positives that occur in the primary screening process, the technology may be improved. The number of false positives arising from primary screening leads to the issue of whether this type of data should be used for virtual screening. Care when using Weka's cost-sensitive classifiers is needed - across the board misclassification costs based on class ratios should not be used when comparing differing classifiers for the same dataset

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Bournemouth University Research Online

Therapeutic opportunities within the DNA damage response

Author: A Chatr-Aryamontri
A Franceschini
A Franchitto
A Gaulton
AA Larrea
AC Antoniou
AE Tomkinson
AJ Bardin
AK Rustgi
AL Chambers
AL Marston
Amanda C. Schierz
AP Eker
AW Oliver
B Vogelstein
B Vogelstein
Bissan Al-Lazikani
C Greenman
C McWhirter
CA Lipinski
CJ Lord
CJ Richardson
D Croft
D Dorjsuren
DM van Pel
DM van Pel
DP Cahill
DW Parsons
E Kuhn
E Pastwa
EB Fauman
EC Friedberg
EK Bancroft
Frances M. G. Pearl
FS Collins
GH Nguyen
GM Kupfer
GP Pfeifer
H Panda
J Bartkova
J Zhang
JL Riffell
JM Daley
JM Murray
JM Paul
JN Weinstein
JP Overington
JW Harper
K Chang
KC Bulusu
KH Khoo
KH Lim
KK Hoe
KL Kanchi
L Galluzzi
L Weiss
Laurence H. Pearl
LB Alexandrov
LB Alexandrov
LJ Barber
LV Liu
M Aggarwal
M Ashburner
M Bastian
M D'Antonio
M Kanehisa
M Tischkowitz
ME Smoot
MN Patel
MS Lindstrom
NC Turner
NG Howlett
NG Jaspers
NJ Curtin
O Espinosa
P Bouwman
P Workman
PA Futreal
PA Janne
PV Hornbeck
Q Liang
R Brough
R Brough
R Buisson
R Mehra
R Nishi
RA Burrell
RE Verdun
S Prakash
SA Forbes
SA Martin
SI Nikolaev
Simon E. Ward
SL Allinson
SP Jackson
T Helleday
TD Halazonetis
TR Mereniuk
V Landre
WG Kaelin Jr.
X Jacq
Y Nakazawa
YH Fan
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2015
Field of study

The DNA damage response (DDR) is essential for maintaining the genomic integrity of the cell, and its disruption is one of the hallmarks of cancer. Classically, defects in the DDR have been exploited therapeutically in the treatment of cancer with radiation therapies or genotoxic chemotherapies. More recently, protein components of the DDR systems have been identified as promising avenues for targeted cancer therapeutics. Here, we present an in-depth analysis of the function, role in cancer and therapeutic potential of 450 expert-curated human DDR genes. We discuss the DDR drugs that have been approved by the US Food and Drug Administration (FDA) or that are under clinical investigation. We examine large-scale genomic and expression data for 15 cancers to identify deregulated components of the DDR, and we apply systematic computational analysis to identify DDR proteins that are amenable to modulation by small molecules, highlighting potential novel therapeutic targets

Crossref

Online Research @ Cardiff

Institute of Cancer Research Repository

Sussex Research Online

Distinctive Behaviors of Druggable Proteins in Cellular Networks

Author: Amanda C. Schierz (839935)
Bissan Al-Lazikani (839936)
Costas Mitsopoulos (179968)
Paul Workman (54973)
Publication venue
Publication date: 01/12/2015
Field of study

<div><p>The interaction environment of a protein in a cellular network is important in defining the role that the protein plays in the system as a whole, and thus its potential suitability as a drug target. Despite the importance of the network environment, it is neglected during target selection for drug discovery. Here, we present the first systematic, comprehensive computational analysis of topological, community and graphical network parameters of the human interactome and identify discriminatory network patterns that strongly distinguish drug targets from the interactome as a whole. Importantly, we identify striking differences in the network behavior of targets of cancer drugs versus targets from other therapeutic areas and explore how they may relate to successful drug combinations to overcome acquired resistance to cancer drugs. We develop, computationally validate and provide the first public domain predictive algorithm for identifying druggable neighborhoods based on network parameters. We also make available full predictions for 13,345 proteins to aid target selection for drug discovery. All target predictions are available through <a href="http://canSAR.icr.ac.uk" target="_blank">canSAR.icr.ac.uk</a>. Underlying data and tools are available at <a href="https://cansar.icr.ac.uk/cansar/publications/druggable_network_neighbourhoods/" target="_blank">https://cansar.icr.ac.uk/cansar/publications/druggable_network_neighbourhoods/</a>.</p></div

Directory of Open Access Journals

PubMed Central

Institute of Cancer Research Repository

FigShare

Enrichment and depletion of key parameters in drug targets over what can be expected at random from the interactome.

Author: Amanda C. Schierz (839935)
Bissan Al-Lazikani (839936)
Costas Mitsopoulos (179968)
Paul Workman (54973)
Publication venue
Publication date
Field of study

<p>A) Graphlets and their constituent isomorphism orbits. The graph shows the graphlets and orbits, ordered by descending size and complexity, most enriched in cancer-drug targets (light blue bars). These same graphlets and orbits are either slightly depleted or not differentiated from random in targets of non-cancer drugs (dark blue). The gray line represents graphlets size and complexity (high-to-low). B) The distribution of detected community sizes and the enrichment or depletion of cancer drug targets (light blule) versus targets of drugs used to treat other diseases (dark blue). C) Box plots showing distinction of degree and google page rank; as well as the vertex modularity which distinguishes inter- versus intra-community communication of nodes. Further parameters are shown in the Supporting Information.</p

FigShare

Cancer-drug targets are enriched for highly connected Graphlets.

Author: Amanda C. Schierz (839935)
Bissan Al-Lazikani (839936)
Costas Mitsopoulos (179968)
Paul Workman (54973)
Publication venue
Publication date
Field of study

<p>A) Interaction network highlighting the distribution of targets of approved cancer drugs (pink); targets of approved drugs from non-cancer therapeutic areas (blue); and targets predicted to be druggable by different druggability prediction methodologies(light and dark green). Druggable proteins are spread widely across the network while targets of current approved drugs tend to cluster into few areas. B) Cumulative fraction of all drug targets covered by communities. As indicated, a small number of communities cover the majority of drug targets. C) The network communities most enriched in drug targets are listed against the fold enrichment of the number of targets found in them (compared to what can be expected at random).</p

FigShare

Network profiles and interactions between targets of drug combinations.

Author: Amanda C. Schierz (839935)
Bissan Al-Lazikani (839936)
Costas Mitsopoulos (179968)
Paul Workman (54973)
Publication venue
Publication date
Field of study

<p>A) Radar plots showing representative network property profiles of targets of drug combination. MEK and BRAF network property profiles are more similar to one another than the network profiles of CDKs and HMGCR. This may be related to the long-term effectiveness of the combinations of drugs targeting these proteins. B) Interactions between proteins targeted by drug combination showing high level of connectivity between targets such as EGFR, BRAF and MEK. The dotted edge indicates that no direct interaction takes place between HMGCR and the other proteins in the network.</p

FigShare