397 research outputs found
Modified Huffman Code for Lossless Compression and Bandwidth Optimization and Applying Genetic Algorithms to Generating Paintings Based on Images
This thesis contains two projects. A modified Huffman code is presented as a lossless method to compress common traffic types. We posit the usage of compression benefits instead of just frequency of occurrence, as is common in Huffman codes, as the priority of each node when constructing the Huffman tree. We show the effectiveness of this method on common data transmission types and describe what would be needed for adoption of this algorithm. We explore genetic algorithms as a method to create paintings based on images. We find a balance between computational work required and visually pleasing results to the algorithm, prioritizing aspects of the parameter space based on their impact on the painting and how they impact computational workload
Antioxidants: nanotechnology and biotechnology fusion for medicine in overall
Antioxidant is a chemical
substance that is naturally found in our
food. It can prevent or reduce the
oxidative stress of the physiological
system. Due to the regular usage of
oxygen, the body continuously
produces free radicals. Excessive
number of free radicals could cause
cellular damage in the human body that
could lead to various diseases like
cancer, muscular degeneration and
diabetes. The presence of antioxidants
helps to counterattack the effect of
these free radicals. The antioxidant can
be found in abundance in plants and
most of the time there are problems
with the delivery. The solution is by
using nanotechnology that has
multitude potential for advanced
medical science. Nano devices and
nanoparticles have significant impact
as they can interact with the subcellular
level of the body with a high degree of
specificity. Thus, the treatment can be
in maximum efficacy with little side
effect
Clustering by compression
We present a new method for clustering based on compression. The method
doesn't use subject-specific features or background knowledge, and works as
follows: First, we determine a universal similarity distance, the normalized
compression distance or NCD, computed from the lengths of compressed data files
(singly and in pairwise concatenation). Second, we apply a hierarchical
clustering method. The NCD is universal in that it is not restricted to a
specific application area, and works across application area boundaries. A
theoretical precursor, the normalized information distance, co-developed by one
of the authors, is provably optimal but uses the non-computable notion of
Kolmogorov complexity. We propose precise notions of similarity metric, normal
compressor, and show that the NCD based on a normal compressor is a similarity
metric that approximates universality. To extract a hierarchy of clusters from
the distance matrix, we determine a dendrogram (binary tree) by a new quartet
method and a fast heuristic to implement it. The method is implemented and
available as public software, and is robust under choice of different
compressors. To substantiate our claims of universality and robustness, we
report evidence of successful application in areas as diverse as genomics,
virology, languages, literature, music, handwritten digits, astronomy, and
combinations of objects from completely different domains, using statistical,
dictionary, and block sorting compressors. In genomics we presented new
evidence for major questions in Mammalian evolution, based on
whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta
hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure
GVC: efficient random access compression for gene sequence variations
Background: In recent years, advances in high-throughput sequencing technologies have enabled the use of genomic information in many fields, such as precision medicine, oncology, and food quality control. The amount of genomic data being generated is growing rapidly and is expected to soon surpass the amount of video data. The majority of sequencing experiments, such as genome-wide association studies, have the goal of identifying variations in the gene sequence to better understand phenotypic variations. We present a novel approach for compressing gene sequence variations with random access capability: the Genomic Variant Codec (GVC). We use techniques such as binarization, joint row- and column-wise sorting of blocks of variations, as well as the image compression standard JBIG for efficient entropy coding. Results: Our results show that GVC provides the best trade-off between compression and random access compared to the state of the art: it reduces the genotype information size from 758 GiB down to 890 MiB on the publicly available 1000 Genomes Project (phase 3) data, which is 21% less than the state of the art in random-access capable methods. Conclusions: By providing the best results in terms of combined random access and compression, GVC facilitates the efficient storage of large collections of gene sequence variations. In particular, the random access capability of GVC enables seamless remote data access and application integration. The software is open source and available at https://github.com/sXperfect/gvc/
- …