118 research outputs found
Some Applications of Graph-Based Machine Learning Methods on Biological Data
Machine learning has made considerable contributions to various fields, most notably by
providing methods for predictive modeling and data analysis. Usually, different kinds of
data are best modeled by specialized machine learning models, tailored to account for the
specifics of the data at hand. Graphs are an expressive data representation most suited
for representing relationships between objects. The relationships can be interactions,
hierarchies, similarities, or others. Such structures can be found in different kinds of
data, including biological ones. Luckily, machine learning toolbox abounds with methods
suitable for handling these kinds of data and we consider several applications of such
graph-based machine learning methods on biological data. First we discuss tree-like
hierarchies over the target variable values and the ways to account for such hierarchies
in learning. We consider enzyme classification as a suitable application. Then we discuss
hierarchies over the target variable values corresponding to directed acyclic graphs and
graph neural network as a suitable model for this kind of data. We consider protein function
classification as a suitable application. Finally, we discuss construction of similarity graphs
over tabular instances, based on autoencoders and graph representation learning ideas.
We consider the application of such techniques to the exploratory analysis of biological
data related to expression of schizophreniaBook of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 2023Acknowledgement: I would like to thank my coauthors and collaborators: Jovana Kovačević,
Petar Veličković, Stefan Spalević, Nevena Ćirić, Predrag Janjić, and Stefan Kapuna
Guiding search in automated theorem proving
U ovom radu se razmatra problem usmeravanja pretrage u automatskom
dokazivanju teorema. Rad se sastoji od dva dela čija je dodirna tačka CDCL sistem
pretrage, koji se intenzivno koristi kod modernih SAT rešavača.
U prvom delu rada razmatran je problem jednostavnog usmeravanja pretrage
| izborom rešavača, njegovih heuristika i njihovih parametara, a u zavisnosti od
svojstava instance koju je potrebno rešiti. Osnova predloženih metoda za izbor algoritama
je sintaksna sličnost formula koja se odražava na njihovu grafovsku strukturu.
Ova sličnost je prvi put pouzdano ustanovljena i analizirana pomoću originalne
mere sličnosti grafova (koja se pokazala korisnom i u drugim domenima).
Praktični pristupi merenju sličnosti formula se zbog računske efikasnosti ipak zasnivaju
na numeričkim atributima iskaznih formula. Predložene su dve jednostavne
metode izbora algoritma zasnovane na algoritmu k najbližih suseda. Prva tehnika,
ArgoSmArT se zasniva na klasifikaciji instance u jednu od unapred zadatih familija
za koje su poznati algoritmi koji ih efikasno rešavaju. Instanca se rešava algoritmom
koji odgovara familiji u koju je instanca klasifikovana. Druga tehnika, ArgoSmArT
k-NN se zasniva na nalaženju nekoliko sličnih instanci u trening skupu za koje je poznato
vreme rešavanja pomoću svih algoritama kojima sistem raspolaže. Instanca
se rešava algoritmom koji se najbolje ponaša na pronađenim instancama. Tehnika
ArgoSmArT je pogodnija za izbor konfiguracije SAT rešavača, a ArgoSmArT k-NN za
izbor samog SAT rešavača. Tehnika ArgoSmArT k-NN se pokazala značajno efikasnijom
od najvažnijeg i pritom vrlo složenog sistema za izbor SAT rešavača i sistema
SATzilla. Pored problema izbora KNF SAT rešavača i njihovih heuristika, razmatran
je i problem izbora ne-KNF SAT rešavača u kojem fokus nije bio na tehnikama
izbora rešavača, pošto se predložene tehnike direktno primenjuju i na taj problem,
već na atributima kojima se ne-KNF instance mogu opisati, a koji do sad nisu predloženi. Rezultati u ovom domenu su pozitivni, ali za sada ograničeni. Osnovni
razlog za to je nedostatak veceg broja ne-KNF resavaca raznovrsnog ponasanja, sto
ne iznenaduje s obzirom da je ova vrsta resavaca tek u svom povoju.
Pored konstrukcije ekasnog sistema za izbor SAT resavaca, prikazana je i metodologija
poredenja SAT resavaca zasnovana na statistickom testiranju hipoteza.
Potreba za ovakvom metodologijom proizilazi iz velike varijacije vremena resavanja
jedne formule od strane jednog SAT resavaca, sto moze dovesti do razlicitih redosleda
SAT resavaca prilikom poredenja njihovih performansi ili rangiranja, sto
je i eksperimentalno demonstrirano. Predlozena metodologija pruza ocenu statisti
cke znacajnosti testiranja i ocenu velicine efekta, poput verovatnoce da jedan
SAT resavac bude brzi od drugog...In this thesis the problem of guiding search in automated theorem proving
is considered. The thesis consists of two parts that have the CDCL search
system, the system intensively used by modern SAT solvers, as their common topic.
In the rst part of the thesis a simple approach to guiding search is considered
| guiding by the selection of the solver, its heuristics, and their parameters, based
on the properties of an instance to be solved. The basis of the proposed methods
for algorithm selection is syntactical similarity of formulae which is re
ected in their
graph structure. This graph similarity is established and analyzed by using an
original graph similarity measure (which turned out to be useful in other contexts,
too). Yet, practical approaches to measuring similarity of formulae are based on their
numerical features due to the computational complexity issues. Two simple methods
for algorithm selection, based on k nearest neighbors, were proposed. The rst
technique, ArgoSmArT is based on classication of instance in one of the predened
families for which the ecient algorithms are known. The instance is solved by
algorithm corresponding to the family to which the instance was classied. The
second technique, ArgoSmArT k-NN is based on nding several similar instances in
the training set for which the solving times by all considered algorithms are known.
The instance is solved by the algorithm that behaves the best on those instances.
ArgoSmArT technique is better suited for conguration selection of a SAT solver,
and ArgoSmArT k-NN for SAT solver selection. ArgoSmArT k-NN technique showed
to be more ecient than the most important and very complex system for SAT
solver selection | SATzilla system. Apart from CNF SAT solver selection, the
problem of non-CNF SAT solver selection is considered. The focus was not on
solver selection techniques, since the proposed techniques are directly applicable,
but on the attributes that can be used to describe non-CNF SAT instances, which
have not been proposed earlier. The results in this domain are positive, but still
limited. The main reason for that is the lack of greater number of non-CNF SAT
solver of dierent behaviour, which is not surprising, having in mind that this kind
of solvers is in its early stage of development.
Apart from construction of ecient SAT solver selection system, the methodology
of SAT solver comparison, based on statistical hypothesis testing is proposed.
The need for such a methodology comes from great run time variations of single
instance solving by a solver, which can result in dierent SAT solver orderings when
one tries to compare their performance or rank them, as experimentally demonstrated.
The proposed methodology gives the estimate of statistical signicance of the
performed test and the estimate of the eect size, for instance the probability of a solver being faster than another..
Beyond Kalman Filters: Deep Learning-Based Filters for Improved Object Tracking
Traditional tracking-by-detection systems typically employ Kalman filters
(KF) for state estimation. However, the KF requires domain-specific design
choices and it is ill-suited to handling non-linear motion patterns. To address
these limitations, we propose two innovative data-driven filtering methods. Our
first method employs a Bayesian filter with a trainable motion model to predict
an object's future location and combines its predictions with observations
gained from an object detector to enhance bounding box prediction accuracy.
Moreover, it dispenses with most domain-specific design choices characteristic
of the KF. The second method, an end-to-end trainable filter, goes a step
further by learning to correct detector errors, further minimizing the need for
domain expertise. Additionally, we introduce a range of motion model
architectures based on Recurrent Neural Networks, Neural Ordinary Differential
Equations, and Conditional Neural Processes, that are combined with the
proposed filtering methods. Our extensive evaluation across multiple datasets
demonstrates that our proposed filters outperform the traditional KF in object
tracking, especially in the case of non-linear motion patterns -- the use case
our filters are best suited to. We also conduct noise robustness analysis of
our filters with convincing positive results. We further propose a new cost
function for associating observations with tracks. Our tracker, which
incorporates this new association cost with our proposed filters, outperforms
the conventional SORT method and other motion-based trackers in multi-object
tracking according to multiple metrics on motion-rich DanceTrack and SportsMOT
datasets.Comment: 29 page
An Analysis of Energy Efficient Data Transfer between Mobile Device and Dedicated Server
This paper discusses research results with regard to energy-efficient transmission of serialised data between servers and mobile devices. A test environment was created in which the research authors primarily measured electricity consumption during communication between a mobile device and server. Numerical results were used to determine how well data serialisation was performed on a dedicated server and its effects on the power consumption of a mobile device. The time spent in data serialisation and the size of the serialised file were found to significantly influence energy consumption. Based on that fact, results have been used to create a mathematical model which was later introduced with functional forms. The main variables in those functional forms were time of serialisation and size of a serialised file. The data collected through this research has been used for an experimental API-CB Saver, which based on mathematical models chooses the most favourable manner of serialisation and compression in real time. The results collected during the tests show that the CBSaver-Api approach performs with greater energy efficiency than current techniques. Furthermore, with optimal selection of data serialisation type and compression level in real time the considered system shows better performance in power saving. According to the results, the API-CBSaver tests indicate the direction which one should take for the purposes of improving energy efficiency
Preface
The dust that a building is transformed into when it becomes a ruin holds precious traces of the past. The hands of an archaeologist will search through it patiently, and find a necklace bead of a woman that lived in it. The hands of an architect will virtually transform the dust into a mortar, brick, or stone. The first profession sees through the unbuilt. The second one builds from it. However, both perform their work by communicating with the sciences.
Throughout history, various components were chosen, measured, and mixed into one of the most complex building composites ever - mortar, whose re-creation is of invaluable importance for architectural conservation. Geologists and chemists will best tell us about its composition. However, sometimes, while excavating a ruined wall, an archaeologist finds a mortar trowel, accidentally left by the past builder. Is this a more valuable trace for revealing the creation of a wall than the binder/aggregate ratio of the mortar used? Can we pick it up and imagine the hands that combined colourful aggregate grains with the earth, gypsum, lime, or cement
- …