37 research outputs found
Approximating Weighted Duo-Preservation in Comparative Genomics
Motivated by comparative genomics, Chen et al. [9] introduced the Maximum
Duo-preservation String Mapping (MDSM) problem in which we are given two
strings and from the same alphabet and the goal is to find a
mapping between them so as to maximize the number of duos preserved. A
duo is any two consecutive characters in a string and it is preserved in the
mapping if its two consecutive characters in are mapped to same two
consecutive characters in . The MDSM problem is known to be NP-hard and
there are approximation algorithms for this problem [3, 5, 13], but all of them
consider only the "unweighted" version of the problem in the sense that a duo
from is preserved by mapping to any same duo in regardless of their
positions in the respective strings. However, it is well-desired in comparative
genomics to find mappings that consider preserving duos that are "closer" to
each other under some distance measure [19]. In this paper, we introduce a
generalized version of the problem, called the Maximum-Weight Duo-preservation
String Mapping (MWDSM) problem that captures both duos-preservation and
duos-distance measures in the sense that mapping a duo from to each
preserved duo in has a weight, indicating the "closeness" of the two
duos. The objective of the MWDSM problem is to find a mapping so as to maximize
the total weight of preserved duos. In this paper, we give a polynomial-time
6-approximation algorithm for this problem.Comment: Appeared in proceedings of the 23rd International Computing and
Combinatorics Conference (COCOON 2017
The Maximum Duo-Preservation String Mapping Problem with Bounded Alphabet
Given two strings A and B such that B is a permutation of A, the max duo-preservation string mapping (MPSM) problem asks to find a mapping ? between them so as to preserve a maximum number of duos. A duo is any pair of consecutive characters in a string and it is preserved by ? if its two consecutive characters in A are mapped to same two consecutive characters in B. This problem has received a growing attention in recent years, partly as an alternative way to produce approximation algorithms for its minimization counterpart, min common string partition, a widely studied problem due its applications in comparative genomics. Considering this favored field of application with short alphabet, it is surprising that MPSM^?, the variant of MPSM with bounded alphabet, has received so little attention, with a single yet impressive work that provides a 2.67-approximation achieved in O(n) [Brubach, 2018], where n = |A| = |B|. Our work focuses on MPSM^?, and our main contribution is the demonstration that this problem admits a Polynomial Time Approximation Scheme (PTAS) when ? = O(1). We also provide an alternate, somewhat simpler, proof of NP-hardness for this problem compared with the NP-hardness proof presented in [Haitao Jiang et al., 2012]
Improved imbalanced classification through convex space learning
Imbalanced datasets for classification problems, characterised by unequal distribution of samples, are abundant in practical scenarios. Oversampling algorithms generate synthetic data to enrich classification performance for such datasets. In this thesis, I discuss two algorithms LoRAS & ProWRAS, improving on the state-of-the-art as shown through rigorous benchmarking on publicly available datasets. A biological application for detection of rare cell-types from single-cell transcriptomics data is also discussed. The thesis also provides a better theoretical understanding behind oversampling
Applications of MATLAB in Science and Engineering
The book consists of 24 chapters illustrating a wide range of areas where MATLAB tools are applied. These areas include mathematics, physics, chemistry and chemical engineering, mechanical engineering, biological (molecular biology) and medical sciences, communication and control systems, digital signal, image and video processing, system modeling and simulation. Many interesting problems have been included throughout the book, and its contents will be beneficial for students and professionals in wide areas of interest