157 research outputs found
Gene identification using phylogenetic metrics with conditional random fields
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 69-72).While the complete sequence of the human genome contains all the information necessary for encoding a complete human being, its interpretation remains a major challenge of modern biology. The first step to any genomic analysis is a comprehensive and accurate annotation of all genes encoded in the genome, providing the basis for understanding human variation, gene regulation, health and disease. Traditionally, the problem of computational gene prediction has been addressed using graphical probabilistic models of genomic sequence. While such models have been successful for small genomes with relatively simple gene structure, new methods are necessary for scaling these to the complete human genome, and for leveraging information across multiple mammalian species currently being sequenced. While generative models like hidden Markov models (HMMs) face the difficulty of modeling both coding and non-coding regions across a complete genome, discriminative models such as Conditional Random Fields (CRFs) have recently emerged, which focus specifically on the discrimination problem of gene identification, and can therefore be more powerful. One of the most attractive characteristics of these models is that their general framework also allows the incorporation of any number of independently derived feature functions (metrics), which can increase discriminatory power. While most of the work on CRFs for gene finding has been on model construction and training, there has not been much focus on the metrics used in such discriminatory frameworks. This is particularly important with the availability of rich comparative genome data, enabling the development of phylogenetic gene identification metrics which can maximally use alignments of a large number of genomes.(cont.) In this work I address the question of gene identification using multiple related genomes. I first present novel comparative metrics for gene classification that show considerable improvement over existing work, and also scale well with an increase in the number of aligned genomes. Second, I describe a general methodology of extending pair-wise metrics to alignments of multiple genomes that incorporates the evolutionary phylogenetic relationship between informant species. Third, I evaluate various methods of combining metrics that exploit metric independence and result in superior classification. Finally, I incorporate the metrics into a Conditional Random Field gene model, to perform unrestricted de novo gene prediction on 12-species alignments of the D. melanogaster genome, and demonstrate accuracy rivaling that of state-of-the-art gene prediction systems.by Ameya Nitin Deoras.S.M
Self-consistency for open-ended generations
In this paper, we present a novel approach for improving the quality and
consistency of generated outputs from large-scale pre-trained language models
(LLMs). Self-consistency has emerged as an effective approach for prompts with
fixed answers, selecting the answer with the highest number of votes. In this
paper, we introduce a generalized framework for self-consistency that extends
its applicability beyond problems that have fixed-answer answers. Through
extensive simulations, we demonstrate that our approach consistently recovers
the optimal or near-optimal generation from a set of candidates. We also
propose lightweight parameter-free similarity functions that show significant
and consistent improvements across code generation, autoformalization, and
summarization tasks, even without access to token log probabilities. Our method
incurs minimal computational overhead, requiring no auxiliary reranker models
or modifications to the existing model
Recommended from our members
The predictability and representation of South Asian Monsoon low-pressure systems in reanalyses and subseasonal-to-seasonal prediction models
Monsoon low-pressure systems (LPSs) are synoptic-scale systems that form during
boreal summer, mainly over the head of the Bay of Bengal (BoB). However, regional varieties can also form over the Arabian Sea and near Sri Lanka. Despite their ability to cause
catastrophic floods in the Indian subcontinent, there has been insufficient exploration of
their predictability and prediction skill. This thesis examines LPS prediction and structure as well as large-scale controls on their frequency in Subseasonal-to-Seasonal (S2S)
prediction models. Using a feature-tracking algorithm, we identify LPSs in eleven S2S
models during a common reforecast period of June–September 1999–2010, verifying the
results against ERA-Interim (ERA-I) and MERRA-2 reanalyses. Moreover, we examine
characteristics of LPS regional varieties using ERA-I.
The S2S models simulate tracks and structure of LPSs reasonably well; however, all
models underestimate their frequency, and BoM, CMA and HMCR models have large
biases in their simulation. The subseasonal probabilistic frequency predictions by BoM,
CMA, CNRM and ECMWF models are the most accurate.
Among regional varieties, Arabian Sea LPSs are least frequent. Short-lived BoB LPSs
are most frequent and bring the most precipitation to eastern India. We then examine
the modulation of LPSs on different time scales: the tropical intraseasonal oscillation
modulates genesis of all varieties, and La NiËœna and negative Indian Ocean Dipole enhance
genesis of Sri Lankan LPSs. Most S2S models correctly simulate enhanced LPS frequency
when the active phase of the Madden-Julian Oscillation is over the Indian Ocean and
Maritime Continent. Large-scale conditions, such as the position of the tropical easterly
jet and mid-tropospheric relative humidity, play a role in determining whether BoB LPSs
continue their propagation across north-central India.
These results provide a framework for understanding LPS predictability, envisaging
improved disaster preparedness in the Indian subcontinent
Political history of Maharashtra from the earliest times to circa 1000 A.D.
Chapter I - deals with the geography of Maharashtra and I have tried to throw as much light as possible upon this obscure subject from various sources. I have also marshalled all the available information about the tribes and peoples inhabiting the various parts of the Maratha country. Chapter II - deals with the history of Maharashtra from the earliest times down to c. 200 B.C. Important questions such as the Aryanlsatlon of the country and Pre- Aryan history have been discussed. Mention may be made here of the Palthan excavations which throw an interesting sidelight on the earliest period. Chapter III - describes the rise and growth of the Satavahana empire. Complicated questions like the original home of the Satavahanas, and their genealogy and chronology have been handled. Chapter IV - deals with the Scythians in Maharashtra. I have put forward a new view as regards the date of the Saka king Nahapana. Chapter V - deals with the history of the powerful but little-known Vakataka kingdom. Chapter VI - deals with the history of Southern Maharashtra under the Kadambas. In Chapter VII, I have treated the history of minor dynasties which had been neglected for a long time. This chapter brings to light the Kalachurl, Traikutaka and the Nala dynasties. Chapter VIII - includes the history of the Early Chalukyas of Badami. I have thrown new light on the origin of the Chalukyas and their relations with the different powers of Northern and Southern India. I have also suggested a new date for the last Chalukya expedition against the Pallavas of Kanchi. Chapter IX - deals with the early history of the Rashtrakuta i.;milies . The obscure history of one of thesefamilies has been illuminated by the latest discoveries of copperplates. The reign of Govinda III, the greatest Rashtrakuta emperor, has been thoroughly dealt with and several complex problems of his time have been solved in a new fashion. Chapter X - deals with the history of the Rashtrakuta empire down to 975 A.D. Particular attention has been paid here to the empire's relations with the Eastern Chalu- kyas, the Ganges of Mysore, the Kalachuris and others. New reasons have been put forward for the fall of the empire. Chapter XI - deals with the history of the Later Chalukyas down to c. 1000 A.D. New light has been shed on the reigns of Taila II and his son Satyasraya. Further, the history of another branch of the Chalukya dynasty has been treated in the light of new inscriptions. Chapter XII - includes the minor dynasties of Maharashtra. I have put forward a new view as regards the origin of the Early Yadavas. I have fully dealt with the rise of the Silaharas, the Kadambas and the Rattas
Recommended from our members
Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes
Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human
TOLERANCE SHOWN BY Rattus rattus TO AN ANTICOAGULANT RODENTICIDE
Apart from using 0.005% concentration, the recommended field dose of 0.025% of the anticoagulant is used along with an alternate food for individual rats for a varying number of days. Those that had survived were taken as tolerant, provided they showed an mg/kg intake beyond the tolerance limit, survived a six days of feeding, exhibited bait-shyness and did not exhibit hemorrhage after death. In determining the criteria for tolerance to an anticoagulant by a rat, one should take into account four composite factors. These are, six days of even 0.025% feeding, bait-shyness when alternate food is given, higher mg/kg intake than the tolerance level and a loss of intensive hemorrhage after death
Pre-trained Recommender Systems: A Causal Debiasing Perspective
Recent studies on pre-trained vision/language models have demonstrated the
practical benefit of a new, promising solution-building paradigm in AI where
models can be pre-trained on broad data describing a generic task space and
then adapted successfully to solve a wide range of downstream tasks, even when
training data is severely limited (e.g., in zero- or few-shot learning
scenarios). Inspired by such progress, we investigate in this paper the
possibilities and challenges of adapting such a paradigm to the context of
recommender systems, which is less investigated from the perspective of
pre-trained model. In particular, we propose to develop a generic recommender
that captures universal interaction patterns by training on generic user-item
interaction data extracted from different domains, which can then be fast
adapted to improve few-shot learning performance in unseen new domains (with
limited data).
However, unlike vision/language data which share strong conformity in the
semantic space, universal patterns underlying recommendation data collected
across different domains (e.g., different countries or different E-commerce
platforms) are often occluded by both in-domain and cross-domain biases
implicitly imposed by the cultural differences in their user and item bases, as
well as their uses of different e-commerce platforms. As shown in our
experiments, such heterogeneous biases in the data tend to hinder the
effectiveness of the pre-trained model. To address this challenge, we further
introduce and formalize a causal debiasing perspective, which is substantiated
via a hierarchical Bayesian deep learning model, named PreRec. Our empirical
studies on real-world data show that the proposed model could significantly
improve the recommendation performance in zero- and few-shot learning settings
under both cross-market and cross-platform scenarios.Comment: 8 pages, WSDM 2
- …