41 research outputs found
Artificial Neural Network Pruning to Extract Knowledge
Artificial Neural Networks (NN) are widely used for solving complex problems
from medical diagnostics to face recognition. Despite notable successes, the
main disadvantages of NN are also well known: the risk of overfitting, lack of
explainability (inability to extract algorithms from trained NN), and high
consumption of computing resources. Determining the appropriate specific NN
structure for each problem can help overcome these difficulties: Too poor NN
cannot be successfully trained, but too rich NN gives unexplainable results and
may have a high chance of overfitting. Reducing precision of NN parameters
simplifies the implementation of these NN, saves computing resources, and makes
the NN skills more transparent. This paper lists the basic NN simplification
problems and controlled pruning procedures to solve these problems. All the
described pruning procedures can be implemented in one framework. The developed
procedures, in particular, find the optimal structure of NN for each task,
measure the influence of each input signal and NN parameter, and provide a
detailed verbal description of the algorithms and skills of NN. The described
methods are illustrated by a simple example: the generation of explicit
algorithms for predicting the results of the US presidential election.Comment: IJCNN 202
Fractional norms and quasinorms do not help to overcome the curse of dimensionality
The curse of dimensionality causes the well-known and widely discussed
problems for machine learning methods. There is a hypothesis that using of the
Manhattan distance and even fractional quasinorms lp (for p less than 1) can
help to overcome the curse of dimensionality in classification problems. In
this study, we systematically test this hypothesis. We confirm that fractional
quasinorms have a greater relative contrast or coefficient of variation than
the Euclidean norm l2, but we also demonstrate that the distance concentration
shows qualitatively the same behaviour for all tested norms and quasinorms and
the difference between them decays as dimension tends to infinity. Estimation
of classification quality for kNN based on different norms and quasinorms shows
that a greater relative contrast does not mean better classifier performance
and the worst performance for different databases was shown by different norms
(quasinorms). A systematic comparison shows that the difference of the
performance of kNN based on lp for p=2, 1, and 0.5 is statistically
insignificant
Long and short range multi-locus QTL interactions in a complex trait of yeast
We analyse interactions of Quantitative Trait Loci (QTL) in heat selected
yeast by comparing them to an unselected pool of random individuals. Here we
re-examine data on individual F12 progeny selected for heat tolerance, which
have been genotyped at 25 locations identified by sequencing a selected pool
[Parts, L., Cubillos, F. A., Warringer, J., Jain, K., Salinas, F., Bumpstead,
S. J., Molin, M., Zia, A., Simpson, J. T., Quail, M. A., Moses, A., Louis, E.
J., Durbin, R., and Liti, G. (2011). Genome research, 21(7), 1131-1138]. 960
individuals were genotyped at these locations and multi-locus genotype
frequencies were compared to 172 sequenced individuals from the original
unselected pool (a control group). Various non-random associations were found
across the genome, both within chromosomes and between chromosomes. Some of the
non-random associations are likely due to retention of linkage disequilibrium
in the F12 population, however many, including the inter-chromosomal
interactions, must be due to genetic interactions in heat tolerance. One region
of particular interest involves 3 linked loci on chromosome IV where the
central variant responsible for heat tolerance is antagonistic, coming from the
heat sensitive parent and the flanking ones are from the more heat tolerant
parent. The 3-locus haplotypes in the selected individuals represent a highly
biased sample of the population haplotypes with rare double recombinants in
high frequency. These were missed in the original analysis and would never be
seen without the multigenerational approach. We show that a statistical
analysis of entropy and information gain in genotypes of a selected population
can reveal further interactions than previously seen. Importantly this must be
done in comparison to the unselected population's genotypes to account for
inherent biases in the original population
Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation
Finding best architectures of learning machines, such as deep neural
networks, is a well-known technical and theoretical challenge. Recent work by
Mellor et al (2021) showed that there may exist correlations between the
accuracies of trained networks and the values of some easily computable
measures defined on randomly initialised networks which may enable to search
tens of thousands of neural architectures without training. Mellor et al used
the Hamming distance evaluated over all ReLU neurons as such a measure.
Motivated by these findings, in our work, we ask the question of the existence
of other and perhaps more principled measures which could be used as
determinants of success of a given neural architecture. In particular, we
examine, if the dimensionality and quasi-orthogonality of neural networks'
feature space could be correlated with the network's performance after
training. We showed, using the setup as in Mellor et al, that dimensionality
and quasi-orthogonality may jointly serve as network's performance
discriminants. In addition to offering new opportunities to accelerate neural
architecture search, our findings suggest important relationships between the
networks' final performance and properties of their randomly initialised
feature spaces: data dimension and quasi-orthogonality
Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph
Large datasets represented by multidimensional data point clouds often
possess non-trivial distributions with branching trajectories and excluded
regions, with the recent single-cell transcriptomic studies of developing
embryo being notable examples. Reducing the complexity and producing compact
and interpretable representations of such data remains a challenging task. Most
of the existing computational methods are based on exploring the local data
point neighbourhood relations, a step that can perform poorly in the case of
multidimensional and noisy data. Here we present ElPiGraph, a scalable and
robust method for approximation of datasets with complex structures which does
not require computing the complete data distance matrix or the data point
neighbourhood graph. This method is able to withstand high levels of noise and
is capable of approximating complex topologies via principal graph ensembles
that can be combined into a consensus principal graph. ElPiGraph deals
efficiently with large and complex datasets in various fields from biology,
where it can be used to infer gene dynamics from single-cell RNA-Seq, to
astronomy, where it can be used to explore complex structures in the
distribution of galaxies.Comment: 32 pages, 14 figure
Personality Traits and Drug Consumption. A Story Told by Data
This is a preprint version of the first book from the series: "Stories told
by data". In this book a story is told about the psychological traits
associated with drug consumption. The book includes:
- A review of published works on the psychological profiles of drug users.
- Analysis of a new original database with information on 1885 respondents
and usage of 18 drugs. (Database is available online.)
- An introductory description of the data mining and machine learning methods
used for the analysis of this dataset.
- The demonstration that the personality traits (five factor model,
impulsivity, and sensation seeking), together with simple demographic data,
give the possibility of predicting the risk of consumption of individual drugs
with sensitivity and specificity above 70% for most drugs.
- The analysis of correlations of use of different substances and the
description of the groups of drugs with correlated use (correlation pleiades).
- Proof of significant differences of personality profiles for users of
different drugs. This is explicitly proved for benzodiazepines, ecstasy, and
heroin.
- Tables of personality profiles for users and non-users of 18 substances.
The book is aimed at advanced undergraduates or first-year PhD students, as
well as researchers and practitioners. No previous knowledge of machine
learning, advanced data mining concepts or modern psychology of personality is
assumed. For more detailed introduction into statistical methods we recommend
several undergraduate textbooks. Familiarity with basic statistics and some
experience in the use of probabilities would be helpful as well as some basic
technical understanding of psychology.Comment: A preprint version prepared by the authors before the Springer
editorial work. 124 pages, 27 figures, 63 tables, bibl. 24
Metallome of cerebrovascular endothelial cells infected with Toxoplasma gondii using μ-XRF imaging and inductively coupled plasma mass spectrometry
In this study, we measured the levels of elements in human brain microvascular endothelial cells (ECs) infected with T. gondii. ECs were infected with tachyzoites of the RH strain, and at 6, 24, and 48 hours post infection (hpi), the intracellular concentrations of elements were determined using a synchrotron–microfocus X-ray fluorescence microscopy (μ-XRF) system. This method enabled the quantification of the concentrations of Zn and Ca in infected and uninfected (control) ECs at sub-micron spatial resolution. T. gondii-hosting ECs contained less Zn than uninfected cells only at 48 hpi (p 0.05). Inductively Coupled Plasma Mass Spectrometry (ICP-MS) analysis revealed infection-specific metallome profiles characterized by significant increases in the intracellular levels of Zn, Fe, Mn and Cu at 48 hpi (p < 0.01), and significant reductions in the extracellular concentrations of Co, Cu, Mo, V, and Ag at 24 hpi (p < 0.05) compared with control cells. Zn constituted the largest part (74%) of the total metal composition (metallome) of the parasite. Gene expression analysis showed infection-specific upregulation in the expression of five genes, MT1JP, MT1M, MT1E, MT1F, and MT1X, belonging to the metallothionein gene family. These results point to a possible correlation between T. gondii infection and increased expression of MT1 isoforms and altered intracellular levels of elements, especially Zn and Fe. Taken together, a combined μ-XRF and ICP-MS approach is promising for studies of the role of elements in mediating host–parasite interaction
Fibro-inflammatory recovery and type 2 diabetes remission following a low calorie diet but not exercise training: A secondary analysis of the DIASTOLIC randomised controlled trial
AimsTo investigate the relationship between fibro-inflammatory biomarkers and cardiovascular structure/function in people with Type 2 Diabetes (T2D) compared to healthy controls and the effect of two lifestyle interventions in T2D.MethodsData were derived from the DIASTOLIC randomised controlled trial (RCT) and includes a comparison between those with T2D and the matched healthy volunteers recruited at baseline. Adults with T2D without cardiovascular disease (CVD) were randomized to a 12-week intervention either: (1) exercise training, (2) a low-energy (∼810 kcal/day) meal-replacement plan (MRP) or (3) standard care. Principal Component and Fisher's linear discriminant analysis were used to investigate the relationships between MRI acquired cardiovascular outcomes and fibro-inflammatory biomarkers in cases versus controls and pre- and post-intervention in T2D.ResultsAt baseline, 83 people with T2D (mean age 50.5 ± 6.4; 58% male) and 36 healthy controls (mean age 48.6 ± 6.2; 53% male) were compared and 76 people with T2D completed the RCT for pre- post-analysis. Compared to healthy controls, subjects with T2D had adverse cardiovascular remodelling and a fibro-inflammatory profile (20 differentially expressed biomarkers). The 3D data visualisations showed almost complete separation between healthy controls and those with T2D, and a marked shift towards healthy controls following the MRP (15 biomarkers significantly changed) but not exercise training.ConclusionsFibro-inflammatory pathways and cardiovascular structure/function are adversely altered before the onset of symptomatic CVD in middle-aged adults with T2D. The MRP improved the fibro-inflammatory profile of people with T2D towards a more healthy status. Long-term studies are required to assess whether these changes lead to continued reverse cardiac remodelling and prevent CVD
A systematic autopsy survey of human infant bridging veins
In the first years of life, subdural haemorrhage (SDH) within the cranial cavity can occur through accidental and non-accidental mechanisms as well as from birth-related injury. This type of bleeding is the most common finding in victims of abusive head trauma (AHT). Historically, the most frequent cause of SDHs in infancy is suggested to be traumatic damage to bridging veins traversing from the brain to the dural membrane. However, several alternative hypotheses have been suggested for the cause and origin of subdural bleeding. It has also been suggested by some that bridging veins are too large to rupture through the forces associated with AHT. To date, there have been no systematic anatomical studies on infant bridging veins. During 43 neonatal, infant and young child post-mortem examinations, we have mapped the locations and numbers of bridging veins onto a 3D model of the surface of a representative infant brain. We have also recorded the in situ diameter of 79 bridging veins from two neonatal, one infant and two young children at post-mortem examination. Large numbers of veins, both distant from and directly entering the dural venous sinuses, were discovered travelling between the brain and dural membrane, with the mean number of veins per brain being 54.1 and the largest number recorded as 94. The mean diameter of the bridging veins was 0.93Â mm, with measurements ranging from 0.05 to 3.07Â mm. These data demonstrate that some veins are extremely small and subjectively, and they appear to be delicate. Characterisation of infant bridging veins will contribute to the current understanding of potential vascular sources of subdural bleeding and could also be used to further develop computational models of infant head injury