118 research outputs found

    Some Applications of Graph-Based Machine Learning Methods on Biological Data

    Get PDF
    Machine learning has made considerable contributions to various fields, most notably by providing methods for predictive modeling and data analysis. Usually, different kinds of data are best modeled by specialized machine learning models, tailored to account for the specifics of the data at hand. Graphs are an expressive data representation most suited for representing relationships between objects. The relationships can be interactions, hierarchies, similarities, or others. Such structures can be found in different kinds of data, including biological ones. Luckily, machine learning toolbox abounds with methods suitable for handling these kinds of data and we consider several applications of such graph-based machine learning methods on biological data. First we discuss tree-like hierarchies over the target variable values and the ways to account for such hierarchies in learning. We consider enzyme classification as a suitable application. Then we discuss hierarchies over the target variable values corresponding to directed acyclic graphs and graph neural network as a suitable model for this kind of data. We consider protein function classification as a suitable application. Finally, we discuss construction of similarity graphs over tabular instances, based on autoencoders and graph representation learning ideas. We consider the application of such techniques to the exploratory analysis of biological data related to expression of schizophreniaBook of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 2023Acknowledgement: I would like to thank my coauthors and collaborators: Jovana Kovačević, Petar Veličković, Stefan Spalević, Nevena Ćirić, Predrag Janjić, and Stefan Kapuna

    Guiding search in automated theorem proving

    Get PDF
    U ovom radu se razmatra problem usmeravanja pretrage u automatskom dokazivanju teorema. Rad se sastoji od dva dela čija je dodirna tačka CDCL sistem pretrage, koji se intenzivno koristi kod modernih SAT rešavača. U prvom delu rada razmatran je problem jednostavnog usmeravanja pretrage | izborom rešavača, njegovih heuristika i njihovih parametara, a u zavisnosti od svojstava instance koju je potrebno rešiti. Osnova predloženih metoda za izbor algoritama je sintaksna sličnost formula koja se odražava na njihovu grafovsku strukturu. Ova sličnost je prvi put pouzdano ustanovljena i analizirana pomoću originalne mere sličnosti grafova (koja se pokazala korisnom i u drugim domenima). Praktični pristupi merenju sličnosti formula se zbog računske efikasnosti ipak zasnivaju na numeričkim atributima iskaznih formula. Predložene su dve jednostavne metode izbora algoritma zasnovane na algoritmu k najbližih suseda. Prva tehnika, ArgoSmArT se zasniva na klasifikaciji instance u jednu od unapred zadatih familija za koje su poznati algoritmi koji ih efikasno rešavaju. Instanca se rešava algoritmom koji odgovara familiji u koju je instanca klasifikovana. Druga tehnika, ArgoSmArT k-NN se zasniva na nalaženju nekoliko sličnih instanci u trening skupu za koje je poznato vreme rešavanja pomoću svih algoritama kojima sistem raspolaže. Instanca se rešava algoritmom koji se najbolje ponaša na pronađenim instancama. Tehnika ArgoSmArT je pogodnija za izbor konfiguracije SAT rešavača, a ArgoSmArT k-NN za izbor samog SAT rešavača. Tehnika ArgoSmArT k-NN se pokazala značajno efikasnijom od najvažnijeg i pritom vrlo složenog sistema za izbor SAT rešavača i sistema SATzilla. Pored problema izbora KNF SAT rešavača i njihovih heuristika, razmatran je i problem izbora ne-KNF SAT rešavača u kojem fokus nije bio na tehnikama izbora rešavača, pošto se predložene tehnike direktno primenjuju i na taj problem, već na atributima kojima se ne-KNF instance mogu opisati, a koji do sad nisu predloženi. Rezultati u ovom domenu su pozitivni, ali za sada ograničeni. Osnovni razlog za to je nedostatak veceg broja ne-KNF resavaca raznovrsnog ponasanja, sto ne iznenaduje s obzirom da je ova vrsta resavaca tek u svom povoju. Pored konstrukcije ekasnog sistema za izbor SAT resavaca, prikazana je i metodologija poredenja SAT resavaca zasnovana na statistickom testiranju hipoteza. Potreba za ovakvom metodologijom proizilazi iz velike varijacije vremena resavanja jedne formule od strane jednog SAT resavaca, sto moze dovesti do razlicitih redosleda SAT resavaca prilikom poredenja njihovih performansi ili rangiranja, sto je i eksperimentalno demonstrirano. Predlozena metodologija pruza ocenu statisti cke znacajnosti testiranja i ocenu velicine efekta, poput verovatnoce da jedan SAT resavac bude brzi od drugog...In this thesis the problem of guiding search in automated theorem proving is considered. The thesis consists of two parts that have the CDCL search system, the system intensively used by modern SAT solvers, as their common topic. In the rst part of the thesis a simple approach to guiding search is considered | guiding by the selection of the solver, its heuristics, and their parameters, based on the properties of an instance to be solved. The basis of the proposed methods for algorithm selection is syntactical similarity of formulae which is re ected in their graph structure. This graph similarity is established and analyzed by using an original graph similarity measure (which turned out to be useful in other contexts, too). Yet, practical approaches to measuring similarity of formulae are based on their numerical features due to the computational complexity issues. Two simple methods for algorithm selection, based on k nearest neighbors, were proposed. The rst technique, ArgoSmArT is based on classication of instance in one of the predened families for which the ecient algorithms are known. The instance is solved by algorithm corresponding to the family to which the instance was classied. The second technique, ArgoSmArT k-NN is based on nding several similar instances in the training set for which the solving times by all considered algorithms are known. The instance is solved by the algorithm that behaves the best on those instances. ArgoSmArT technique is better suited for conguration selection of a SAT solver, and ArgoSmArT k-NN for SAT solver selection. ArgoSmArT k-NN technique showed to be more ecient than the most important and very complex system for SAT solver selection | SATzilla system. Apart from CNF SAT solver selection, the problem of non-CNF SAT solver selection is considered. The focus was not on solver selection techniques, since the proposed techniques are directly applicable, but on the attributes that can be used to describe non-CNF SAT instances, which have not been proposed earlier. The results in this domain are positive, but still limited. The main reason for that is the lack of greater number of non-CNF SAT solver of dierent behaviour, which is not surprising, having in mind that this kind of solvers is in its early stage of development. Apart from construction of ecient SAT solver selection system, the methodology of SAT solver comparison, based on statistical hypothesis testing is proposed. The need for such a methodology comes from great run time variations of single instance solving by a solver, which can result in dierent SAT solver orderings when one tries to compare their performance or rank them, as experimentally demonstrated. The proposed methodology gives the estimate of statistical signicance of the performed test and the estimate of the eect size, for instance the probability of a solver being faster than another..

    Beyond Kalman Filters: Deep Learning-Based Filters for Improved Object Tracking

    Full text link
    Traditional tracking-by-detection systems typically employ Kalman filters (KF) for state estimation. However, the KF requires domain-specific design choices and it is ill-suited to handling non-linear motion patterns. To address these limitations, we propose two innovative data-driven filtering methods. Our first method employs a Bayesian filter with a trainable motion model to predict an object's future location and combines its predictions with observations gained from an object detector to enhance bounding box prediction accuracy. Moreover, it dispenses with most domain-specific design choices characteristic of the KF. The second method, an end-to-end trainable filter, goes a step further by learning to correct detector errors, further minimizing the need for domain expertise. Additionally, we introduce a range of motion model architectures based on Recurrent Neural Networks, Neural Ordinary Differential Equations, and Conditional Neural Processes, that are combined with the proposed filtering methods. Our extensive evaluation across multiple datasets demonstrates that our proposed filters outperform the traditional KF in object tracking, especially in the case of non-linear motion patterns -- the use case our filters are best suited to. We also conduct noise robustness analysis of our filters with convincing positive results. We further propose a new cost function for associating observations with tracks. Our tracker, which incorporates this new association cost with our proposed filters, outperforms the conventional SORT method and other motion-based trackers in multi-object tracking according to multiple metrics on motion-rich DanceTrack and SportsMOT datasets.Comment: 29 page

    An Analysis of Energy Efficient Data Transfer between Mobile Device and Dedicated Server

    Get PDF
    This paper discusses research results with regard to energy-efficient transmission of serialised data between servers and mobile devices. A test environment was created in which the research authors primarily measured electricity consumption during communication between a mobile device and server. Numerical results were used to determine how well data serialisation was performed on a dedicated server and its effects on the power consumption of a mobile device. The time spent in data serialisation and the size of the serialised file were found to significantly influence energy consumption. Based on that fact, results have been used to create a mathematical model which was later introduced with functional forms. The main variables in those functional forms were time of serialisation and size of a serialised file. The data collected through this research has been used for an experimental API-CB Saver, which based on mathematical models chooses the most favourable manner of serialisation and compression in real time. The results collected during the tests show that the CBSaver-Api approach performs with greater energy efficiency than current techniques. Furthermore, with optimal selection of data serialisation type and compression level in real time the considered system shows better performance in power saving. According to the results, the API-CBSaver tests indicate the direction which one should take for the purposes of improving energy efficiency

    Preface

    Get PDF
    The dust that a building is transformed into when it becomes a ruin holds precious traces of the past. The hands of an archaeologist will search through it patiently, and find a necklace bead of a woman that lived in it. The hands of an architect will virtually transform the dust into a mortar, brick, or stone. The first profession sees through the unbuilt. The second one builds from it. However, both perform their work by communicating with the sciences. Throughout history, various components were chosen, measured, and mixed into one of the most complex building composites ever - mortar, whose re-creation is of invaluable importance for architectural conservation. Geologists and chemists will best tell us about its composition. However, sometimes, while excavating a ruined wall, an archaeologist finds a mortar trowel, accidentally left by the past builder. Is this a more valuable trace for revealing the creation of a wall than the binder/aggregate ratio of the mortar used? Can we pick it up and imagine the hands that combined colourful aggregate grains with the earth, gypsum, lime, or cement
    corecore