357 research outputs found

    Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers

    Full text link
    This paper describes a new method, Combi-bootstrap, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. Combi-bootstrap uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that Combi-bootstrap: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.Comment: 4 page

    Memory-Based Learning: Using Similarity for Smoothing

    Full text link
    This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domain-specific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains, and allows the easy integration of diverse information sources, such as rich lexical representations.Comment: 8 pages, uses aclap.sty, To appear in Proc. ACL/EACL 9

    e^+e^- Annihilations into Quasi-two-body Final States at 10.58 GeV

    Get PDF
    We report the first observation of e+ee^+e^- annihilations into hadronic states of positive CC-parity, ρ0ρ0\rho^0 \rho^0 and ϕρ0\phi\rho^0. The angular distributions support two-virtual-photon annihilation production. We also report the observations of e+eϕηe^+e^-\to \phi\eta and a preliminary result on e+eρ+ρe^+e^-\to \rho^+\rho^-.Comment: Invited talk, 7 pages, 4 postscript figures, contributed to the Workshop on Exclusive Reactions at High Momentum Transfer, 21-24 May 2007, Jla

    MBT: A Memory-Based Part of Speech Tagger-Generator

    Full text link
    We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using {\em IGTree}, a tree-based formalism for indexing and searching huge case bases.} The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure

    Forgetting Exceptions is Harmful in Language Learning

    Get PDF
    We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.Comment: 31 pages, 7 figures, 10 tables. uses 11pt, fullname, a4wide tex styles. Pre-print version of article to appear in Machine Learning 11:1-3, Special Issue on Natural Language Learning. Figures on page 22 slightly compressed to avoid page overloa

    Pedagogical Techniques Employed by the Science Television Show MythBusters

    Get PDF
    The long-running Discovery Channel science television show MythBusters has proven itself to be far more than just a source of weekly entertainment. The popular cable program employs an array of sophisticated pedagogical techniques to communicate scientific concepts to its audience. These techniques include: achieving active learning, accommodating different learning styles, avoiding jargon, employing repetition to ensure comprehension, anthropomorphizing physical phenomena, using captivating demonstrations, cultivating an enthusiastic disposition, and increasing intrinsic motivation to learn. In this content analysis, episodes from the show’s 10-year history were methodically examined for these instructional techniques. MythBusters represents an untapped source of pedagogical techniques educators at all levels may consider availing themselves of in their tireless effort to better reach their students. Science educators in particular may look to MythBusters for inspiration and guidance in how to incorporate these pedagogical techniques into their own teaching and help their students in the learning process

    How the Science Entertainment Television Show MythBusters Teaches the Scientific Method

    Get PDF
    All too often, high school—and even university—students graduate with only a partial or oversimplified understanding of what the scientific method is and how to employ it. The long-running Discovery Channel television show MythBusters has attracted the attention of political leaders and prominent universities for having the potential to address this problem and help young people learn to think critically. MythBusters communicates many aspects of the scientific method not usually covered in the classroom: the use of experimental controls, the use of logical reasoning, the importance of objectivity, the operational definitions, the small-scale testing, the interpretation of results, and the importance of repeatability of results. In this content analysis, episodes from the show’s 10-year history were methodically examined for aspects of the scientific method

    Effective distributed representations for academic expert search

    Get PDF
    Expert search aims to find and rank experts based on a user's query. In academia, retrieving experts is an efficient way to navigate through a large amount of academic knowledge. Here, we study how different distributed representations of academic papers (i.e. embeddings) impact academic expert retrieval. We use the Microsoft Academic Graph dataset and experiment with different configurations of a document-centric voting model for retrieval. In particular, we explore the impact of the use of contextualized embeddings on search performance. We also present results for paper embeddings that incorporate citation information through retrofitting. Additionally, experiments are conducted using different techniques for assigning author weights based on author order. We observe that using contextual embeddings produced by a transformer model trained for sentence similarity tasks produces the most effective paper representations for document-centric expert retrieval. However, retrofitting the paper embeddings and using elaborate author contribution weighting strategies did not improve retrieval performance.Comment: To be published in the Scholarly Document Processing 2020 Workshop @ EMNLP 2020 proceeding

    Simulation of a data center cooling system in an emergency situation

    Get PDF
    The paper deals with keeping server rooms at reasonable air temperature in the case of an electrical power failure in a data center and with building performance simulations used to support emergency power planning. An existing data center was analyzed in detail with respect to the possibilities of emergency cooling. Based on the assumption that the thermal capacity of already chilled water can be used to prolong functionality of the cooling system when the roof chillers are out of operation, a backup power supply was designed for Computer Room Air-Conditioning and even for the cooling liquid circuit pumps (i.e. not for the roof chillers). Special models representing the data center indoor environment and cooling system, including a detailed model of the Computer Room Air Conditioning (CRAC) units, were developed in order to estimate the time period during which the internal air temperatures in the server room will not exceed the limit. The numerical model of the server room and the cooling system was built in the TRNSYS software and calibrated by measured data acquired from a real power outage situation. The results and conclusions obtained from the performed analyses and simulations helped to improve the emergency power plan of the data center. The study also forms the basis for the development of an emergency decision algorithm that will included in the novel supervisory control platform: GENi
    corecore