6,687 research outputs found
Recommended from our members
Efficient analysis and storage of large-scale genomic data
The impending advent of population-scaled sequencing cohorts involving tens of millions of individuals with matched phenotypic measurements will produce unprecedented volumes of genetic data. Storing and analysing such gargantuan datasets places computational performance at a pivotal position in medical genomics. In this thesis, I explore the potential for accelerating and parallelizing standard genetics workflows, file formats, and algorithms using both hardware-accelerated vectorization, parallel and distributed
algorithms, and heterogeneous computing.
First, I describe a novel bit-counting operation termed the positional population-count, which can be used together with succinct representations and standard efficient operations to accelerate many genetic calculations. In order to enable the use of this new operator and the canonical population count on any target machine I developed a unified low-level library using CPU dispatching to select the optimal method contingent on the available
instruction set architecture and the given input size at run-time. As a proof-of-principle application, I apply the positional population-count operator to computing quality control-related summary statistics for terabyte-scaled sequencing readsets with >3,800-fold speed improvements. As another application, I describe a framework for efficiently computing the cardinality of set intersection using these operators and applied this framework to efficiently compute genome-wide linkage-disequilibrium in datasets with up to 67 million samples resulting in up to >60-fold improvements in speed for dense genotypic vectors and up to >250,000-fold savings in memory and >100,000-fold improvement in speed for sparse genotypic vectors. I next describe a framework for handling the terabytes of compressed output data and describe graphical routines for visualizing long-range linkage-disequilibrium blocks as seen over many human centromeres. Finally, I describe efficient algorithms for storing and querying very large genetic datasets and specialized algorithms for the genotype component of such datasets with >10,000-fold savings in memory compared to the current interchange format.Wellcome Trus
Advance Android PHAs/Malware Detection Techniques by Utilizing Signature Data, Behavioral Patterns and Machine Learning
During the last decade mobile phones and tablets evolved into smart devices with enormous computing power and storage capacity packed in a pocket size. People around the globe have quickly moved from laptops to smartphones for their daily computational needs. From web browsing, social networking, photography to critical bank payments and intellectual property every thing has got into smartphones; and undoubtedly Android has dominated the smartphone market. Android growth also attracted cyber criminals to focus on creating attacks and malwares to target Android users. Malwares in different category are seen in the Android ecosystem, including botnets, Ransomware, click Trojan, SMS frauds, banking Trojans.
Due to huge amount of application being developed and distributed every day, Android needs malware analysis techniques that are different than any other operating system. This research focuses on defining a process of finding Android malware in a given large number of new applications. Research utilizes machine learning techniques in predicting possible malware and further provide assistance in reverse engineering of malware. Under this thesis an assistive Android malware analysis system “AndroSandX” is proposed, researched and developed. AndroSandX allows researcher to quickly analyze potential Android malware and help perform manual analysis.
Key features of the system are strong assistive capabilities using machine learning, built in ticketing system, highly modular design, storage with non-relational databases, backup of analysis data for archival, assistance in manual analysis and threat intelligence. Research results shows that the system has a prediction accuracy of around 92%. Research has wide scope and lean towards providing industry oriented Android malware analysis assistive system/product
Gathering as onthological practice among Evenki of Eastern Siberia
Through visual analysis presented in 15 tables the authors looked at the complexity of gathering as practice that not only plays a role in subsistence, but also creates meaning and frames an engagement with the environment. Gathering is studied as
consisting of several processes: the searching; cleaning and sorting things, to lay out and to dry things; and transportation, consumption and packing. Objects that are gathered are shown to play important roles of mediums for people and their environment. Cases of berries, firewood, jade stones and ice are presented as
illustrations of this argument. In the final part of the article gathering is studied as a metaphysical phenomenon: a process of switching from disorder to order and back. Gathering poses many metaphysical questions in a practical form, and the authors
propose to look at how people deal with these questions. How does the world change for those who gather things? How do they experience this transformation? Does the human attempt to collect things become an attempt to order the chaotic environment, classify it, and contain chaos into small volumes of their bags and buckets? This study is based on social anthropological fieldwork conducted among Evenki people
of East Buryatia
Efficient string algorithmics across alphabet realms
Stringology is a subfield of computer science dedicated to analyzing and processing sequences of symbols. It plays a crucial role in various applications, including lossless compression, information retrieval, natural language processing, and bioinformatics. Recent algorithms often assume that the strings to be processed are over polynomial integer alphabet, i.e., each symbol is an integer that is at most polynomial in the lengths of the strings. In contrast to that, the earlier days of stringology were shaped by the weaker comparison model, in which strings can only be accessed by mere equality comparisons of symbols, or (if the symbols are totally ordered) order comparisons of symbols. Nowadays, these flavors of the comparison model are respectively referred to as general unordered alphabet and general ordered alphabet. In this dissertation, we dive into the realm of both integer alphabets and general alphabets. We present new algorithms and lower bounds for classic problems, including Lempel-Ziv compression, computing the Lyndon array, and the detection of squares and runs. Our results show that, instead of only assuming the standard model of computation, it is important to also consider both weaker and stronger models. Particularly, we should not discard the older and weaker comparison-based models too quickly, as they are not only powerful theoretical tools, but also lead to fast and elegant practical solutions, even by today's standards
- …