157 research outputs found

    Gene identification using phylogenetic metrics with conditional random fields

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 69-72).While the complete sequence of the human genome contains all the information necessary for encoding a complete human being, its interpretation remains a major challenge of modern biology. The first step to any genomic analysis is a comprehensive and accurate annotation of all genes encoded in the genome, providing the basis for understanding human variation, gene regulation, health and disease. Traditionally, the problem of computational gene prediction has been addressed using graphical probabilistic models of genomic sequence. While such models have been successful for small genomes with relatively simple gene structure, new methods are necessary for scaling these to the complete human genome, and for leveraging information across multiple mammalian species currently being sequenced. While generative models like hidden Markov models (HMMs) face the difficulty of modeling both coding and non-coding regions across a complete genome, discriminative models such as Conditional Random Fields (CRFs) have recently emerged, which focus specifically on the discrimination problem of gene identification, and can therefore be more powerful. One of the most attractive characteristics of these models is that their general framework also allows the incorporation of any number of independently derived feature functions (metrics), which can increase discriminatory power. While most of the work on CRFs for gene finding has been on model construction and training, there has not been much focus on the metrics used in such discriminatory frameworks. This is particularly important with the availability of rich comparative genome data, enabling the development of phylogenetic gene identification metrics which can maximally use alignments of a large number of genomes.(cont.) In this work I address the question of gene identification using multiple related genomes. I first present novel comparative metrics for gene classification that show considerable improvement over existing work, and also scale well with an increase in the number of aligned genomes. Second, I describe a general methodology of extending pair-wise metrics to alignments of multiple genomes that incorporates the evolutionary phylogenetic relationship between informant species. Third, I evaluate various methods of combining metrics that exploit metric independence and result in superior classification. Finally, I incorporate the metrics into a Conditional Random Field gene model, to perform unrestricted de novo gene prediction on 12-species alignments of the D. melanogaster genome, and demonstrate accuracy rivaling that of state-of-the-art gene prediction systems.by Ameya Nitin Deoras.S.M

    Self-consistency for open-ended generations

    Full text link
    In this paper, we present a novel approach for improving the quality and consistency of generated outputs from large-scale pre-trained language models (LLMs). Self-consistency has emerged as an effective approach for prompts with fixed answers, selecting the answer with the highest number of votes. In this paper, we introduce a generalized framework for self-consistency that extends its applicability beyond problems that have fixed-answer answers. Through extensive simulations, we demonstrate that our approach consistently recovers the optimal or near-optimal generation from a set of candidates. We also propose lightweight parameter-free similarity functions that show significant and consistent improvements across code generation, autoformalization, and summarization tasks, even without access to token log probabilities. Our method incurs minimal computational overhead, requiring no auxiliary reranker models or modifications to the existing model

    Political history of Maharashtra from the earliest times to circa 1000 A.D.

    Get PDF
    Chapter I - deals with the geography of Maharashtra and I have tried to throw as much light as possible upon this obscure subject from various sources. I have also marshalled all the available information about the tribes and peoples inhabiting the various parts of the Maratha country. Chapter II - deals with the history of Maharashtra from the earliest times down to c. 200 B.C. Important questions such as the Aryanlsatlon of the country and Pre- Aryan history have been discussed. Mention may be made here of the Palthan excavations which throw an interesting sidelight on the earliest period. Chapter III - describes the rise and growth of the Satavahana empire. Complicated questions like the original home of the Satavahanas, and their genealogy and chronology have been handled. Chapter IV - deals with the Scythians in Maharashtra. I have put forward a new view as regards the date of the Saka king Nahapana. Chapter V - deals with the history of the powerful but little-known Vakataka kingdom. Chapter VI - deals with the history of Southern Maharashtra under the Kadambas. In Chapter VII, I have treated the history of minor dynasties which had been neglected for a long time. This chapter brings to light the Kalachurl, Traikutaka and the Nala dynasties. Chapter VIII - includes the history of the Early Chalukyas of Badami. I have thrown new light on the origin of the Chalukyas and their relations with the different powers of Northern and Southern India. I have also suggested a new date for the last Chalukya expedition against the Pallavas of Kanchi. Chapter IX - deals with the early history of the Rashtrakuta i.;milies . The obscure history of one of thesefamilies has been illuminated by the latest discoveries of copperplates. The reign of Govinda III, the greatest Rashtrakuta emperor, has been thoroughly dealt with and several complex problems of his time have been solved in a new fashion. Chapter X - deals with the history of the Rashtrakuta empire down to 975 A.D. Particular attention has been paid here to the empire's relations with the Eastern Chalu- kyas, the Ganges of Mysore, the Kalachuris and others. New reasons have been put forward for the fall of the empire. Chapter XI - deals with the history of the Later Chalukyas down to c. 1000 A.D. New light has been shed on the reigns of Taila II and his son Satyasraya. Further, the history of another branch of the Chalukya dynasty has been treated in the light of new inscriptions. Chapter XII - includes the minor dynasties of Maharashtra. I have put forward a new view as regards the origin of the Early Yadavas. I have fully dealt with the rise of the Silaharas, the Kadambas and the Rattas

    TOLERANCE SHOWN BY Rattus rattus TO AN ANTICOAGULANT RODENTICIDE

    Get PDF
    Apart from using 0.005% concentration, the recommended field dose of 0.025% of the anticoagulant is used along with an alternate food for individual rats for a varying number of days. Those that had survived were taken as tolerant, provided they showed an mg/kg intake beyond the tolerance limit, survived a six days of feeding, exhibited bait-shyness and did not exhibit hemorrhage after death. In determining the criteria for tolerance to an anticoagulant by a rat, one should take into account four composite factors. These are, six days of even 0.025% feeding, bait-shyness when alternate food is given, higher mg/kg intake than the tolerance level and a loss of intensive hemorrhage after death

    Pre-trained Recommender Systems: A Causal Debiasing Perspective

    Full text link
    Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired by such progress, we investigate in this paper the possibilities and challenges of adapting such a paradigm to the context of recommender systems, which is less investigated from the perspective of pre-trained model. In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data). However, unlike vision/language data which share strong conformity in the semantic space, universal patterns underlying recommendation data collected across different domains (e.g., different countries or different E-commerce platforms) are often occluded by both in-domain and cross-domain biases implicitly imposed by the cultural differences in their user and item bases, as well as their uses of different e-commerce platforms. As shown in our experiments, such heterogeneous biases in the data tend to hinder the effectiveness of the pre-trained model. To address this challenge, we further introduce and formalize a causal debiasing perspective, which is substantiated via a hierarchical Bayesian deep learning model, named PreRec. Our empirical studies on real-world data show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings under both cross-market and cross-platform scenarios.Comment: 8 pages, WSDM 2
    • …
    corecore