15 research outputs found
Novel algorithms for motif discovery in bio-sequence datasets
A significant growth in the volume of bio-molecular sequence data (DNA, RNA and protein sequences) over the past decade calls for novel computational techniques to extract meaningful information from such data. Existing methods to extract such information predominantly consist of identifying patterns or motifs, for example, repeated substrings of bio-sequences, conserved substrings in a group of homologous protein sequences, or similar substrings in a set of DNA sequences. Identifying such motifs has applications in, to name a few, understanding gene function, human disease, and identifying potential therapeutic drug targets. Several variants of the motif discovery problem could be identified in the literature and numerous algorithms have been proposed for such variants. In this research work, we propose novel algorithms, significantly different from the techniques adopted so far by the existing algorithms, to address salient problems in the domain of molecular biology that require discovering motifs in a set of bio-sequences. The proposed algorithms employ basic sorting techniques and simple data structures such as arrays and linked lists, and have been shown to perform better in practice than many of the previously known algorithms, when applied to synthetic and real biological datasets.
Fast Algorithms for Selecting Specific siRNA in Complete mRNA Data
Abstract. The Specific Selection Problem arises from the need to design short interfering RNA (siRNA) that aims at gene silencing. These short sequences target specific messenger RNA (mRNA) and cause the degradation of such mRNA, inhibiting the synthesis of the protein generated by it. In [11] this problem was solved in a reasonable amount of time when restricted to the design of siRNA for a particular mRNA, but their approach becomes too time consuming when trying to design siRNA for each mRNA in a given organism. We devise simple algorithms based on sorting and hashing techniques that allow us to solve this problem for the entire mRNA of the Human in less than 4 hours, obtaining a speedup of almost two orders of magnitude over previous approaches.
Greedy Heuristics for Degenerate Primer Selection ⋆
Abstract. Amplification of multiple DNA sequences in a single Polymerase Chain Reaction (PCR) experiment, called Multiplex PCR (MP-PCR), is a widely known laboratory technique that requires efficient selection of a set of synthetic strings called Degenerate Primers to result in successful PCR products. The Multiple Degenerate Primer Selection Problem (MDPSP) is a fundamental problem in molecular biology and has received attention from numerous researchers in the recent past. Several variants of MDPSP have been proved to be NP-Complete. Many heuristic algorithms exist in the literature for MDPSP that have been shown to perform well in practice on real biological data. In this paper, we present two new greedy heuristics for the problem, analyze their time and space complexities and compare the performance of one of the proposed algorithms on random and real biological data with that of two previously reported algorithms. Our results show that the proposed algorithm performs almost equally in terms of quality, i.e., the number of degenerate primers, and executes in time much less than the two algorithms, also eliminating the dependency on an empirical input parameter that the previous algorithms exhibit in their performance both in terms of quality and time of execution. Keywords: Degenerate Primers, Primers for MP-PCR, Multiplex Primers
Pampa: An Improved Branch and Bound Algorithm for Planted (l, d) Motif Search
Abstract. We consider the planted (l, d) motif search problem as defined in [1] and [12], a problem that arises from the need to find transcription factor-binding sites in genomic information. We build an exact branch and bound algorithm, which has small space requirements. An implementation of this algorithm is able to tackle challenging instances as big as (19, 7), cutting by half the time required by the best existing algorithm
Algorithms for Motif Discovery based on Edit Distance ⋆
Abstract. In this paper, we study the problem of identifying sequence patterns of length l in a database DB, consisting of n bio-sequences of average length m each, that have occurrences in at least t distinct sequences of DB, the occurrences being at an edit distance (also called the Levenshtein distance) of at most d from the pattern. We survey some algorithms for the problem from the literature and also present two improved algorithms for the same. An implementation and performance results of one of our algorithms is also presented.
CLINICOPATHOLOGICAL EVALUATION OF ABNORMAL UTERINE BLEEDING IN PERIMENOPAUSAL WOMEN
Objective: Abnormal uterine bleeding (AUB) is defined as any bleeding from the uterus that is deviated from normal in terms of regularity, duration, frequency, and amount of blood loss. It interferes with the woman’s physical, emotional, social, and material quality of life. The prevalence of AUB ranges between 11% and 13% rising to 24% in perimenopausal women. The aim of this study is to analyze the demographic profile and risk factors of AUB in perimenopausal women, to classify the causes of AUB based on PALM-COEIN classification, and to correlate the clinical and sonological findings with post-operative histopathology findings.
Methods: A cross-sectional study was conducted in the Department of Obstetrics and Gynecology at GIMSR hospital, Visakhapatnam, for 1 year from September 2022 to September 2023. A total of 151 perimenopausal women with complaints of AUB in the age group of 41–50 years who underwent hysterectomy were included in the study.
Results: The most common clinical presentation was heavy menstrual bleeding (HMB) seen in 96 cases (63.5%). Clinically, the PALM component accounted for 61% of cases and the COEIN component accounted for 39% of cases. Ultrasonographically, the PALM component was detected in 76.8% of cases. We calculated the sensitivity, specificity, positive predictive value, and negative predictive value of ultrasound for detection of PALM lesions.
Conclusion: Leiomyoma was the most common cause clinically, sonologically, and pathologically. Correlation was good for leiomyoma. USG has good specificity for diagnosis of AUB-A and AUB-P but low sensitivity
RISK FACTORS, MATERNAL, AND PERINATAL OUTCOME OF FETAL MACROSOMIA
Objective: Macrosomia is characterized by a birth weight exceeding 4000 g, regardless of gestational age, or >90th percentile for gestational age. This condition is linked to significant risks of maternal and neonatal morbidity and mortality. Globally, the prevalence of infants weighing ≥4000 g is estimated to be 9%. Various risk factors contribute to the development of fetal macrosomia, including a high pre-pregnancy body mass index (BMI), excessive weight gain during the antenatal period, high parity, male gender of the fetus, prolonged pregnancy, and maternal diabetes mellitus.
Methods: A retrospective cross-sectional study was undertaken in the Department of Obstetrics and Gynecology at GIMSR Teaching Hospital, over a 5-year period from May 2018 to May 2023. The study encompassed all singleton pregnancies with a birth weight equal to or exceeding 4000 g, irrespective of the delivery method. Maternal and neonatal records for the study population were systematically collected, and data were documented.
Results: Throughout the study duration, there were 167 cases where the birth weight equalled or exceeded 4,000 g. Most common maternal complication was prolonged labor and postpartum hemorrhage. Shoulder dystocia was seen in 2.9% of all deliveries and 10.8% of all vaginal deliveries. Most common neonatal complication was hypoglycemia.
Conclusion: The prevalence of macrosomia in our study was 3.86%. Main risk factors identified in our study were male gender, pre pregnancy BMI >25, previous macrosomic births, and excessive weight gain during pregnancy