31 research outputs found
Recommended from our members
Enhanced classification through exploitation of hierarchical structures
textHumans often organize information by encoding it in structures that link
together entities such as concepts, objects, properties etc. Among the various
structures possible, hierarchies are commonly used. For instance, taxonomies
of categories commonly employ hierarchies to indicate that one category “is a”
type of another. The Yahoo! Web Directory and the Open Directory Project
are two examples of large taxonomies where topics are hierarchically arranged.
Hierarchies are also used to recursively decompose composite objects into their
constituent parts. Examples of this are webpages that can be parsed and then
represented as DOM-trees, where the DOM nodes correspond to sections of
the webpages.
In this thesis we argue that these hierarchical relationships between entities can be exploited to facilitate common data mining tasks defined upon
them, like automated classification. Specifically, we show that the information
encoded in these hierarchies can be reduced to constraints on class membership scores that can then be enforced as a post-processing step to enhance the accuracy of classification. We demonstrate our ideas and algorithms on three
real-world tasks.
First, we tackle the problem of classification into hierarchical taxonomies.
We show how different taxonomy structures can be translated into constraints
on the outputs of classifiers learned at the nodes of the hierarchy. In addition,
we give algorithms to optimally enforce these constraints and show that this
results in improved classification accuracy. In cases where the taxonomies
are not available, we give an approach to automatically derive hierarchical
relationships amongst a flat set of categories. Next, we work on the problem
of detecting noisy (templated) parts of webpages. We give algorithms that
rate each section of a webpage in terms of how templated it is. Then we show
that smoothing the output of these template classifiers over the DOM-tree
hierarchy improves the template detection performance of our system. Finally,
we investigate the task of segmenting websites into topically cohesive regions.
We define a framework and within it a set of measures that characterize good
segmentations, and give an efficient algorithm to find the best segmentation
within this framework.
We formalize the problem of enforcing constraints on the outputs of classifiers as regularized isotonic or unimodal regression on rooted trees; these are
generalizations of the classic isotonic regression problem. The nature of the
constraints as well as the cost functions is different in each of the applications
mentioned above. For all these formulations we give efficient algorithms to optimally smooth the classifier outputs. These novel formulations and algorithms
might be of interest independent of the applications in this thesis.Electrical and Computer Engineerin
Recommended from our members
Soft cluster ensembles
Cluster Ensembles is a framework for combining multiple partitionings obtained from separate clustering runs into a final consensus clustering without accessing the original features of the data or the algorithms that determined these partitions. This framework was first proposed by Strehl and Ghosh [31] who also provided three techniques to solve the problem. Since then there have been numerous attempts to solve cluster ensembles using approaches such as Maximum Likelihood using EM, Bipartite Graph Partitioning, Genetic algorithms, and Voting-Merging. Most of this work has focused on devising approaches that aceept hard clusterings as input. Also, there has been no comparison of combining accuracy on soft vs hard cluster ensembles. In this thesis we will show experimentally as well as intuitively that using soft clusterings as input does offer signficant advantages, especially when dealing with vertically partitioned data. We modify many of the above mentioned algorithms to accept soft clusterings and experiment over multiple real-life datasetsElectrical and Computer Engineerin
Enhanced Classification through Exploitation of Hierarchical Structures
Dedicated to my parents, Cdr. Vinod Punera and Shashi Punera
Uncovering leaf rust responsive miRNAs in wheat (Triticum aestivum L.) using high-throughput sequencing and prediction of their targets through degradome analysis
Deep sequencing identified 497 conserved and 559 novel miRNAs in wheat, while degradome analysis revealed 701 targets genes. QRT-PCR demonstrated differential expression of miRNAs during stages of leaf rust progression. Bread wheat (Triticum aestivum L.) is an important cereal food crop feeding 30\ua0% of the world population. Major threat to wheat production is the rust epidemics. This study was targeted towards identification and functional characterizations of micro(mi)RNAs and their target genes in wheat in response to leaf rust ingression. High-throughput sequencing was used for transcriptome-wide identification of miRNAs and their expression profiling in retort to leaf rust using mock and pathogen-inoculated resistant and susceptible near-isogenic wheat plants. A total of 1056 mature miRNAs were identified, of which 497 miRNAs were conserved and 559 miRNAs were novel. The pathogen-inoculated resistant plants manifested more miRNAs compared with the pathogen infected susceptible plants. The miRNA counts increased in susceptible isoline due to leaf rust, conversely, the counts decreased in the resistant isoline in response to pathogenesis illustrating precise spatial tuning of miRNAs during compatible and incompatible interaction. Stem-loop quantitative real-time PCR was used to profile 10 highly differentially expressed miRNAs obtained from high-throughput sequencing data. The spatio-temporal profiling validated the differential expression of miRNAs between the isolines as well as in retort to pathogen infection. Degradome analysis provided 701 predicted target genes associated with defense response, signal transduction, development, metabolism, and transcriptional regulation. The obtained results indicate that wheat isolines employ diverse arrays of miRNAs that modulate their target genes during compatible and incompatible interaction. Our findings contribute to increase knowledge on roles of microRNA in wheat-leaf rust interactions and could help in rust resistance breeding programs
Retroperitoneal fibrosis-clinical presentation and outcome analysis from urological perspective
Purpose: To study clinical presentation, laboratory results, imaging findings and treatment options and outcomes of retroperito-neal fibrosis (RPF). To determine whether it follows the same natural course and response to treatment in the Asian population as in the Western world.Materials and Methods: Medical records of patients diagnosed with RPF on imaging and histopathology between February 2010 and April 2016 were reviewed.Results: Of the 21 patients analyzed, mean age at presentation was 50.81 years. The male to female ratio was 0.9:1. Pain was most common presenting complaint (95.23% cases), almost 85% cases were idiopathic and rests were postradiation induced. The me-dian creatinine level was 1.8 mg/dL. The mean erythrocyte sedimentation rate (ESR) was 53.2 mm/h. Hydronephrosis was present in all patients and 47.6% had atrophic kidneys. Diffuse retroperitoneal mass was present in 61.1%. Ureterolysis with lateralization, omental wrapping or gonadal pedicle wrap was done in 17 cases. Two patients underwent uretero-ureterostomy. One patient un-derwent ileal replacement of ureter, and one ileal conduit. Eighteen patients received concurrent medical treatment, 11 were given tamoxifen, 2 steroids (Prednisolone), and five were given both. Of the 20 patients with follow-up, 70% had complete symptomatic relief; ESR improvement was seen in 77.8%. Follow-up ultrasound showed resolved and decreased hydronephrosis in 20% and 55% respectively. One patient had treatment failure and 17.65% had disease recurrence.Conclusions: RPF is a rare disease with varied presentation and outcomes. The male to female ratio may be equal in Asians and smoking could be lesser contributing factor. More Asian cohort studies are required to support same
De novo assembled wheat transcriptomes delineate differentially expressed host genes in response to leaf rust infection
Pathogens like Puccinia triticina, the causal organism for leaf rust, extensively damages wheat production. The interaction at molecular level between wheat and the pathogen is complex and less explored. The pathogen induced response was characterized using mock- or pathogen inoculated near-isogenic wheat lines (with or without seedling leaf rust resistance gene Lr28). Four Serial Analysis of Gene Expression libraries were prepared from mock- and pathogen inoculated plants and were subjected to Sequencing by Oligonucleotide Ligation and Detection, which generated a total of 165,767,777 reads, each 35 bases long. The reads were processed and multiple k-mers were attempted for de novo transcript assembly; 22 k-mers showed the best results. Altogether 21,345 contigs were generated and functionally characterized by gene ontology annotation, mining for transcription factors and resistance genes. Expression analysis among the four libraries showed extensive alterations in the transcriptome in response to pathogen infection, reflecting reorganizations in major biological processes and metabolic pathways. Role of auxin in determining pathogenesis in susceptible and resistant lines were imperative. The qPCR expression study of four LRR-RLK (Leucine-rich repeat receptor-like protein kinases) genes showed higher expression at 24 hrs after inoculation with pathogen. In summary, the conceptual model of induced resistance in wheat contributes insights on defense responses and imparts knowledge of Puccinia triticina-induced defense transcripts in wheat plants
SNP discovery from next-generation transcriptome sequencing data and their validation using KASP assay in wheat (Triticum aestivum L.)
Single nucleotide polymorphisms (SNPs) are becoming the most amenable form of DNA-based molecular markers for genetic analysis. In hexaploid bread wheat (Triticum aestivum L.), it is difficult to discern true polymorphic SNPs due to homoeologous and paralogous genes. Two serial analysis of gene expression (SAGE) libraries were developed utilizing leaves from resistant plants carrying leaf rust resistance gene Lr28; one library was derived from leaves that were mock inoculated and the other was derived from leaves inoculated with the urediniospores of the leaf rust pathogen Puccinia triticina. Next-generation sequencing reads, after quality trimming and removal of fungal sequences, were mapped to wheat reference sequences at Ensembl Plants. CLC Genomics Workbench and Freebayes softwares were employed for SNP calling. A total of 611 SNPs were predicted to be common by both softwares, of which 207 varietal SNPs were identified by ConservedPrimer software. A subset of 100 SNPs was used for validation across 47 wheat genotypes using Kompetitive Allele Specific PCR (KASP) assay; 83 SNPs could be successfully validated. These SNPs were positioned on wheat subgenomes and chromosome arms. When functionally annotated, many sequences harboring SNPs showed homology to resistance and resistance-like genes listed in Plant Resistance Gene database (PRGdb) as well as pathogenesis-related (PR) and stress-responsive genes. The results of the present study involving discovery of SNPs associated with resistance to leaf rust, a major threat to wheat production worldwide, will be valuable for molecular breeding for rust resistance