18 research outputs found
Analysis of group evolution prediction in complex networks
In the world, in which acceptance and the identification with social
communities are highly desired, the ability to predict evolution of groups over
time appears to be a vital but very complex research problem. Therefore, we
propose a new, adaptable, generic and mutli-stage method for Group Evolution
Prediction (GEP) in complex networks, that facilitates reasoning about the
future states of the recently discovered groups. The precise GEP modularity
enabled us to carry out extensive and versatile empirical studies on many
real-world complex / social networks to analyze the impact of numerous setups
and parameters like time window type and size, group detection method,
evolution chain length, prediction models, etc. Additionally, many new
predictive features reflecting the group state at a given time have been
identified and tested. Some other research problems like enriching learning
evolution chains with external data have been analyzed as well
Crystal-GFN: sampling crystals with desirable properties and constraints
Accelerating material discovery holds the potential to greatly help mitigate
the climate crisis. Discovering new solid-state materials such as
electrocatalysts, super-ionic conductors or photovoltaic materials can have a
crucial impact, for instance, in improving the efficiency of renewable energy
production and storage. In this paper, we introduce Crystal-GFN, a generative
model of crystal structures that sequentially samples structural properties of
crystalline materials, namely the space group, composition and lattice
parameters. This domain-inspired approach enables the flexible incorporation of
physical and structural hard constraints, as well as the use of any available
predictive model of a desired physicochemical property as an objective
function. To design stable materials, one must target the candidates with the
lowest formation energy. Here, we use as objective the formation energy per
atom of a crystal structure predicted by a new proxy machine learning model
trained on MatBench. The results demonstrate that Crystal-GFN is able to sample
highly diverse crystals with low (median -3.1 eV/atom) predicted formation
energy.Comment: Main paper (10 pages) + references + appendi
Towards equilibrium molecular conformation generation with GFlowNets
Sampling diverse, thermodynamically feasible molecular conformations plays a
crucial role in predicting properties of a molecule. In this paper we propose
to use GFlowNet for sampling conformations of small molecules from the
Boltzmann distribution, as determined by the molecule's energy. The proposed
approach can be used in combination with energy estimation methods of different
fidelity and discovers a diverse set of low-energy conformations for highly
flexible drug-like molecules. We demonstrate that GFlowNet can reproduce
molecular potential energy surfaces by sampling proportionally to the Boltzmann
distribution
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Recently, pre-trained foundation models have enabled significant advancements
in multiple fields. In molecular machine learning, however, where datasets are
often hand-curated, and hence typically small, the lack of datasets with
labeled features, and codebases to manage those datasets, has hindered the
development of foundation models. In this work, we present seven novel datasets
categorized by size into three distinct categories: ToyMix, LargeMix and
UltraLarge. These datasets push the boundaries in both the scale and the
diversity of supervised labels for molecular learning. They cover nearly 100
million molecules and over 3000 sparsely defined tasks, totaling more than 13
billion individual labels of both quantum and biological nature. In comparison,
our datasets contain 300 times more data points than the widely used OGB-LSC
PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In
addition, to support the development of foundational models based on our
proposed datasets, we present the Graphium graph machine learning library which
simplifies the process of building and training molecular machine learning
models for multi-task and multi-level molecular datasets. Finally, we present a
range of baseline results as a starting point of multi-task and multi-level
training on these datasets. Empirically, we observe that performance on
low-resource biological datasets show improvement by also training on large
amounts of quantum data. This indicates that there may be potential in
multi-task and multi-level training of a foundation model and fine-tuning it to
resource-constrained downstream tasks
CCR: A combined cleaning and resampling algorithm for imbalanced data classification
Imbalanced data classification is one of the most widespread challenges in contemporary pattern recognition. Varying levels of imbalance may be observed in most real datasets, affecting the performance of classification algorithms. Particularly, high levels of imbalance make serious difficulties, often requiring the use of specially designed methods. In such cases the most important issue is often to properly detect minority examples, but at the same time the performance on the majority class cannot be neglected. In this paper we describe a novel resampling technique focused on proper detection of minority examples in a two-class imbalanced data task. The proposed method combines cleaning the decision border around minority objects with guided synthetic oversampling. Results of the conducted experimental study indicate that the proposed algorithm usually outperforms the conventional oversampling approaches, especially when the detection of minority examples is considered
Using Training Curriculum with Deep Reinforcement Learning. On the Importance of Starting Small
Algorytmy uczenia się przez wzmacnianie są wykorzystywane do rozwiązywania problemów o stale rosnącym poziomie złożoności. W wyniku tego proces uczenia zyskuje na złożoności i wy-maga większej mocy obliczeniowej. Wykorzystanie uczenia z przeniesieniem wiedzy może czę-ściowo ograniczyć ten problem. W artykule wprowadzamy oryginalne środowisko testowe i eks-perymentalnie oceniamy wpływ wykorzystania programów uczenia na głęboką odmianę metody Q-learning.Reinforcement learning algorithms are being used to solve problems with ever-increasing level of complexity. As a consequence, training process becomes harder and more computationally demanding. Using transfer learning can partially elevate this issue by taking advantage of previ-ously acquired knowledge. In this paper we propose a novel test environment and experimentally evaluate impact of using curriculum with deep Q-learning algorithm
Impact of Low Resolution on Image Recognition with Deep Neural Networks: An Experimental Study
Due to the advances made in recent years, methods based on deep neural networks have been able to achieve a state-of-the-art performance in various computer vision problems. In some tasks, such as image recognition, neural-based approaches have even been able to surpass human performance. However, the benchmarks on which neural networks achieve these impressive results usually consist of fairly high quality data. On the other hand, in practical applications we are often faced with images of low quality, affected by factors such as low resolution, presence of noise or a small dynamic range. It is unclear how resilient deep neural networks are to the presence of such factors. In this paper we experimentally evaluate the impact of low resolution on the classification accuracy of several notable neural architectures of recent years. Furthermore, we examine the possibility of improving neural networks’ performance in the task of low resolution image recognition by applying super-resolution prior to classification. The results of our experiments indicate that contemporary neural architectures remain significantly affected by low image resolution. By applying super-resolution prior to classification we were able to alleviate this issue to a large extent as long as the resolution of the images did not decrease too severely. However, in the case of very low resolution images the classification accuracy remained considerably affected