Search CORE

20 research outputs found

Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments

Author: Darvish Mitra
Mehringer Svenja
Rahn Rene
Reinert Knut
Seiler Enrico
Publication venue
Publication date: 01/01/2022
Field of study

Motivation The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data. Results As a solution, we propose Needle, a fast and space-efficient index which can be built for thousands of experiments in <2 h and can estimate the quantification of a transcript in these experiments in seconds, thereby outperforming its competitors. The basic idea of the Needle index is to create multiple interleaved Bloom filters that each store a set of representative k-mers depending on their multiplicity in the raw data. This is then used to quantify the query

Institutional Repository of the Freie Universität Berlin

Parallel Global Edge Switching for the Uniform Sampling of Simple Graphs with Prescribed Degrees

Author: Allendorf Daniel
Meyer Ulrich
Penschuck Manuel
Tran Hung
Publication venue: 'Elsevier BV'
Publication date: 15/02/2023
Field of study

The uniform sampling of simple graphs matching a prescribed degree sequence is an important tool in network science, e.g. to construct graph generators or null-models. Here, the Edge Switching Markov Chain (ES-MC) is a common choice. Given an arbitrary simple graph with the required degree sequence, ES-MC carries out a large number of small changes, called edge switches, to eventually obtain a uniform sample. In practice, reasonably short runs efficiently yield approximate uniform samples. In this work, we study the problem of executing edge switches in parallel. We discuss parallelizations of ES-MC, but find that this approach suffers from complex dependencies between edge switches. For this reason, we propose the Global Edge Switching Markov Chain (G-ES-MC), an ES-MC variant with simpler dependencies. We show that G-ES-MC converges to the uniform distribution and design shared-memory parallel algorithms for ES-MC and G-ES-MC. In an empirical evaluation, we provide evidence that G-ES-MC requires not more switches than ES-MC (and often fewer), and demonstrate the efficiency and scalability of our parallel G-ES-MC implementation

arXiv.org e-Print Archive

Engineering Shared-Memory Parallel Shuffling to Generate Random Permutations In-Place

Author: Penschuck Manuel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Symposium on Experimental Algorithms (SEA 2023)
Publication date: 01/01/2023
Field of study

Shuffling is the process of placing elements into a random order such that any permutation occurs with equal probability. It is an important building block in virtually all scientific areas. We engineer, - to the best of our knowledge - for the first time, a practically fast, parallel shuffling algorithm with O(?n log n) parallel depth that requires only poly-logarithmic auxiliary memory (with high probability). In an empirical evaluation, we compare our implementations with a number of existing solutions on various computer architectures. Our algorithms consistently achieve the highest through-put on all machines. Further, we demonstrate that the runtime of our parallel algorithm is comparable to the time that other algorithms may take to acquire the memory from the operating system to copy the input

Dagstuhl Research Online Publication Server

ВИБІР ДЖЕРЕЛА ВИПАДКОВОСТІ ДЛЯ КОМП’ЮТЕРНОГО МОДЕЛЮВАННЯ

Author: Казакова Надія
Лаптєв Олександр
Собчук Андрій
Фразе-Фразенко Олексій
Щербина Юрій
Publication venue: National Aviation University
Publication date: 31/10/2023
Field of study

The main purpose of evolutionary optimization is to find a combination of parameters (independent variables) that would help maximize or minimize the qualitative, quantitative, and probabilistic characteristics of the problem. Recently, integrated optimization methods have become very common, borrowing the basic principles of their work from wildlife. Researchers are experimenting with different types of representations, for example, evolutionary and genetic algorithms use selection methods and genetic operators. A large number of algorithms based on the swarm method are known. The artificial bee colony is an optimization method that mimics the behavior of bees, a specific application of cluster intelligence, the main feature of which is that it does not need to understand specific information about the problem, you just need to optimize the problem. Comparing inferiority with the help of the local optimization behavior of each person with an artificial bee finally leads to the appearance in the group of a global optimal value with a higher rate of convergence. The paper considers the method of solving the optimization problem based on modeling the behavior of the bee colony. Description of the model of the behavior of intelligence agents and forage agents, search mechanisms, and selection of positions in a given neighborhood. The general structure of the optimization process is given. Graphical results are also presented, which prove the possibility of the bee colony method to optimize the results, i.e. from all multiple sources of information, the bee colony method by optimization can significantly limit the number of information sources, identify a narrow range of sources that may be false information. Which in the future will allow you to more accurately identify sources with false information and block them.У статті розглядаються проблеми вибору джерела випадковості для комп’ютерного моделювання стохастичних процесів, що використовується для дослідження характеристик потоків подій безпеки в розподілених комп’ютерних мережах, на етапі проектування складних автоматизованих систем та процесів, які мають місце в управлінні виробництвом та інфраструктурними об’єктами. Складовою частиною комп’ютерної моделі є джерело випадковості, яке формує рівномірно розподілений потік випадкових цілих або дійсних чисел. Воно повинно формувати потік рівномірно розподілених чисел і, в той же час, бути економічним з точки зору обчислювальних ресурсів. В роботі надано аналіз простих генераторів псевдовипадкових чисел, в алгоритмі яких використовуються прості комп’ютерні операції. До складу таких генераторів віднесені генератор Фібоначчі з запізненням та запропонований Дж. Марсальєю генератор Xorshift128. Відзначено, що будь-яка нерівномірність розподілення чисел на виході генератора, суттєво впливає на якість процесу, який підлягає моделюванню. На основі результатів проведених досліджень існуючих способів постоброблення вихідних послідовностей, зроблено висновок про те, для забезпечення ефективності алгоритму формування потоку рівномірно роз-поділених псевдовипадкових чисел, процедури додаткового оброблення повинні бути достатньо економічними з точки зору задіяних методів обчислення. Оцінка нерівномірності розподілення числового потоку виконувалась з використанням показника хі-квадрат Пірсона. Для корекції вихідного числового потоку запропоновано і обґрунтовано спосіб екстракції з нього тої його частини, ентропія якої найбільша. Також, обґрунтовано параметри гістограми, що дають хороші результати оцінки вихідного розподілення. Показано, що комбінація простого і економічного генератора псевдовипадкових чисел в сукупності з постобробленням дає хороші результати при мінімальних обчислювальних ресурсах

Наукові журнали Національного Авіаційного Університету

A Floating-Point Secure Implementation of the Report Noisy Max with Gap Mechanism

Author: Ding Zeyu
Durrell John
Kifer Daniel
Protivash Prottay
Wang Guanhong
Wang Yuxin
Xiao Yingtai
Zhang Danfeng
Publication venue
Publication date: 15/08/2023
Field of study

The Noisy Max mechanism and its variations are fundamental private selection algorithms that are used to select items from a set of candidates (such as the most common diseases in a population), while controlling the privacy leakage in the underlying data. A recently proposed extension, Noisy Top-k with Gap, provides numerical information about how much better the selected items are compared to the non-selected items (e.g., how much more common are the selected diseases). This extra information comes at no privacy cost but crucially relies on infinite precision for the privacy guarantees. In this paper, we provide a finite-precision secure implementation of this algorithm that takes advantage of integer arithmetic.Comment: 21 page

arXiv.org e-Print Archive

Array programming with NumPy.

Author: Abbasi Hameer
Berg Sebastian
Brett Matthew
Cournapeau David
Del Río Jaime Fernández
Gohlke Christoph
Gommers Ralf
Gérard-Marchant Pierre
Haldane Allan
Harris Charles R
Hoyer Stephan
Kern Robert
Millman K Jarrod
Oliphant Travis E
Peterson Pearu
Picus Matti
Reddy Tyler
Sheppard Kevin
Smith Nathaniel J
Taylor Julian
van der Walt Stéfan J
van Kerkwijk Marten H
Virtanen Pauli
Weckesser Warren
Wiebe Mark
Wieser Eric
Publication venue: Nature
Publication date: 01/01/2020
Field of study

Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis

arXiv.org e-Print Archive

Jyväskylä University Digital Archive

Apollo (Cambridge)