494 research outputs found
Model Selection in Overlapping Stochastic Block Models
Networks are a commonly used mathematical model to describe the rich set of
interactions between objects of interest. Many clustering methods have been
developed in order to partition such structures, among which several rely on
underlying probabilistic models, typically mixture models. The relevant hidden
structure may however show overlapping groups in several applications. The
Overlapping Stochastic Block Model (2011) has been developed to take this
phenomenon into account. Nevertheless, the problem of the choice of the number
of classes in the inference step is still open. To tackle this issue, we
consider the proposed model in a Bayesian framework and develop a new criterion
based on a non asymptotic approximation of the marginal log-likelihood. We
describe how the criterion can be computed through a variational Bayes EM
algorithm, and demonstrate its efficiency by running it on both simulated and
real data.Comment: articl
Inferring Multiple Graphical Structures
Gaussian Graphical Models provide a convenient framework for representing
dependencies between variables. Recently, this tool has received a high
interest for the discovery of biological networks. The literature focuses on
the case where a single network is inferred from a set of measurements, but, as
wetlab data is typically scarce, several assays, where the experimental
conditions affect interactions, are usually merged to infer a single network.
In this paper, we propose two approaches for estimating multiple related
graphs, by rendering the closeness assumption into an empirical prior or group
penalties. We provide quantitative results demonstrating the benefits of the
proposed approaches. The methods presented in this paper are embeded in the R
package 'simone' from version 1.0-0 and later
Determining appropriate approaches for using data in feature selection
Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases
Важлива складова національної безпеки (Проблеми захисту науково-технічної інформації)
У статті порушується проблема забезпечення захисту інформаційних ресурсів у науково-
технічній сфері. Обґрунтовується значення науково-технологічного потенціалу для
економічного і соціального розвитку України. Доводиться необхідність ґрунтовної
розробки відповідної нормативно-правової бази.The article is dedicated to the problem of ensuring of protection of information resources in
scientific-technical sphere, significance of the scientific-technological potential for economical
and social growth of Ukraine is grounded. Necessity of well-founded development of
correspondent normative and legal base is proved
Infection with Toxoplasma gondii does not Alter TNFα and IL-6 Secretion by A human Astrocytoma Cell Line
The secretion of tumour necrosis factor-α (TNFα),
interleukin-1α (IL-α) and interleukin-6 (IL-6) by a
human astrocytoma cell fine was studied 1 h, 3 h, 6 h and 24 h after
infection with tachyzoites from three Toxoplasma gondii
strains (virulent, RH; cystogentc, 76K and Prugniaud strains). The
astrocytoma cell fine constitutively secreted TNFα and IL-6,
but no IL-1α. A positive control was obtained by stimulation
with phorbol esters inducing a significant increase (p < 0.05) in TNFα and IL- 6 secretion but not in IL-1α, while
lipopolysaccharide (alone and after priming), interferon gamma,
ionophore A 23187 and sera positive to T. gondii did
not induce any increase in cytokine levels. None of the tachyzoites,
whatever their virulence, induced a significant increase in cytokine
production at any time in the study. Tachyzoites did not inhibit the
secretion induced by phorbol esters
Can PSMA PET/CT help in dose-tailoring in post-prostatectomy radiotherapy?
There are few randomized trials to evaluate the use of PSMA-PET in the planning of post-prostatectomy radiotherapy. There are two unresolved questions 1) should we increase the dose to lesions visible on PSMA-PET 2) can we reduce dose in the case of a negative PSMA-PET. In this review, we summarize and discuss the available evidence in the literature. We found that in general, there seems to be an advantage for dose-increase, but ta large recent study from the pre-PSMA era didn't show an advantage for dose escalation. Retrospective studies have shown that conventional doses to PSMA-PET-positive lesions seem sufficient. On the other hand, in the case of a negative PSMA-PET, there is no evidence that dose-reduction is possible. In the future, the combination of PSMA-PET with genomic classifiers could help in better identify patients who might benefit from either dose- de-or -increase. We further need to identify intraindividual references to help identify lesions with higher aggressiveness
The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures
Motivation: Biomarker discovery from high-dimensional data is a crucial
problem with enormous applications in biology and medicine. It is also
extremely challenging from a statistical viewpoint, but surprisingly few
studies have investigated the relative strengths and weaknesses of the plethora
of existing feature selection methods. Methods: We compare 32 feature selection
methods on 4 public gene expression datasets for breast cancer prognosis, in
terms of predictive performance, stability and functional interpretability of
the signatures they produce. Results: We observe that the feature selection
method has a significant influence on the accuracy, stability and
interpretability of signatures. Simple filter methods generally outperform more
complex embedded or wrapper methods, and ensemble feature selection has
generally no positive effect. Overall a simple Student's t-test seems to
provide the best results. Availability: Code and data are publicly available at
http://cbio.ensmp.fr/~ahaury/
Identification of disease-causing genes using microarray data mining and gene ontology
Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes.
Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results.
Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth.
Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers
Évolution actuelle (1960-2021) de l’enneigement dans les Vosges à l’aide du modèle régional du climat MAR
peer reviewedL’évolution actuelle de l’enneigement dans les Vosges (N-E de la France) a été simulée à une résolution de 4 km avec le modèle régional du climat MAR (version 3.13) forcé par les réanalyses ERA5. Moyennant un petit ajustement de seulement 3 paramètres (dont 1 °C d’augmentation du seuil de température neige/pluie), MAR a été optimisé et validé sur 5 et 8 hivers (DJF) par rapport à des observations quotidiennes (température, précipitation, hauteur de neige). Sur les 62 hivers (DJF) 1960-2021, MAR suggère une diminution significative statistiquement d’environ un facteur deux de la hauteur moyenne de neige, due à l’augmentation significative des températures (~+2 °C/62 ans). Bien que les précipitations aient légèrement augmenté (+10-20 %/62 ans) à cause d’un renforcement (non significatif) de la circulation d’ouest, elles tombent de plus en plus sous forme de pluie, en particulier en dessous de 1000 m. Au-dessus de 1000 m, il ne neige pas moins qu’avant mais il y a plus de fonte réduisant le manteau neigeux entre deux événements neigeux. En extrapolant les tendances actuelles, une anomalie de +2.5 °C (resp. +3.8 °C) par rapport aux hivers 1960-90 serait suffisante pour ne plus avoir de neige en moyenne en-dessous de 750 m (resp. 1000 m)
- …