357 research outputs found
Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers
This paper describes a new method, Combi-bootstrap, to exploit existing
taggers and lexical resources for the annotation of corpora with new tagsets.
Combi-bootstrap uses existing resources as features for a second level machine
learning module, that is trained to make the mapping to the new tagset on a
very small sample of annotated corpus material. Experiments show that
Combi-bootstrap: i) can integrate a wide variety of existing resources, and ii)
achieves much higher accuracy (up to 44.7 % error reduction) than both the best
single tagger and an ensemble tagger constructed out of the same small training
sample.Comment: 4 page
Memory-Based Learning: Using Similarity for Smoothing
This paper analyses the relation between the use of similarity in
Memory-Based Learning and the notion of backed-off smoothing in statistical
language modeling. We show that the two approaches are closely related, and we
argue that feature weighting methods in the Memory-Based paradigm can offer the
advantage of automatically specifying a suitable domain-specific hierarchy
between most specific and most general conditioning information without the
need for a large number of parameters. We report two applications of this
approach: PP-attachment and POS-tagging. Our method achieves state-of-the-art
performance in both domains, and allows the easy integration of diverse
information sources, such as rich lexical representations.Comment: 8 pages, uses aclap.sty, To appear in Proc. ACL/EACL 9
e^+e^- Annihilations into Quasi-two-body Final States at 10.58 GeV
We report the first observation of annihilations into hadronic
states of positive -parity, and . The angular
distributions support two-virtual-photon annihilation production. We also
report the observations of and a preliminary result on
.Comment: Invited talk, 7 pages, 4 postscript figures, contributed to the
Workshop on Exclusive Reactions at High Momentum Transfer, 21-24 May 2007,
Jla
MBT: A Memory-Based Part of Speech Tagger-Generator
We introduce a memory-based approach to part of speech tagging. Memory-based
learning is a form of supervised learning based on similarity-based reasoning.
The part of speech tag of a word in a particular context is extrapolated from
the most similar cases held in memory. Supervised learning approaches are
useful when a tagged corpus is available as an example of the desired output of
the tagger. Based on such a corpus, the tagger-generator automatically builds a
tagger which is able to tag new text the same way, diminishing development time
for the construction of a tagger considerably. Memory-based tagging shares this
advantage with other statistical or machine learning approaches. Additional
advantages specific to a memory-based approach include (i) the relatively small
tagged corpus size sufficient for training, (ii) incremental learning, (iii)
explanation capabilities, (iv) flexible integration of information in case
representations, (v) its non-parametric nature, (vi) reasonably good results on
unknown words without morphological analysis, and (vii) fast learning and
tagging. In this paper we show that a large-scale application of the
memory-based approach is feasible: we obtain a tagging accuracy that is on a
par with that of known statistical approaches, and with attractive space and
time complexity properties when using {\em IGTree}, a tree-based formalism for
indexing and searching huge case bases.} The use of IGTree has as additional
advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure
Forgetting Exceptions is Harmful in Language Learning
We show that in language learning, contrary to received wisdom, keeping
exceptional training instances in memory can be beneficial for generalization
accuracy. We investigate this phenomenon empirically on a selection of
benchmark natural language processing tasks: grapheme-to-phoneme conversion,
part-of-speech tagging, prepositional-phrase attachment, and base noun phrase
chunking. In a first series of experiments we combine memory-based learning
with training set editing techniques, in which instances are edited based on
their typicality and class prediction strength. Results show that editing
exceptional instances (with low typicality or low class prediction strength)
tends to harm generalization accuracy. In a second series of experiments we
compare memory-based learning and decision-tree learning methods on the same
selection of tasks, and find that decision-tree learning often performs worse
than memory-based learning. Moreover, the decrease in performance can be linked
to the degree of abstraction from exceptions (i.e., pruning or eagerness). We
provide explanations for both results in terms of the properties of the natural
language processing tasks and the learning algorithms.Comment: 31 pages, 7 figures, 10 tables. uses 11pt, fullname, a4wide tex
styles. Pre-print version of article to appear in Machine Learning 11:1-3,
Special Issue on Natural Language Learning. Figures on page 22 slightly
compressed to avoid page overloa
Pedagogical Techniques Employed by the Science Television Show MythBusters
The long-running Discovery Channel science television show MythBusters has proven itself to be far more than just a source of weekly entertainment. The popular cable program employs an array of sophisticated pedagogical techniques to communicate scientific concepts to its audience. These techniques include: achieving active learning, accommodating different learning styles, avoiding jargon, employing repetition to ensure comprehension, anthropomorphizing physical phenomena, using captivating demonstrations, cultivating an enthusiastic disposition, and increasing intrinsic motivation to learn. In this content analysis, episodes from the show’s 10-year history were methodically examined for these instructional techniques. MythBusters represents an untapped source of pedagogical techniques educators at all levels may consider availing themselves of in their tireless effort to better reach their students. Science educators in particular may look to MythBusters for inspiration and guidance in how to incorporate these pedagogical techniques into their own teaching and help their students in the learning process
How the Science Entertainment Television Show MythBusters Teaches the Scientific Method
All too often, high school—and even university—students graduate with only a partial or oversimplified understanding of what the scientific method is and how to employ it. The long-running Discovery Channel television show MythBusters has attracted the attention of political leaders and prominent universities for having the potential to address this problem and help young people learn to think critically. MythBusters communicates many aspects of the scientific method not usually covered in the classroom: the use of experimental controls, the use of logical reasoning, the importance of objectivity, the operational definitions, the small-scale testing, the interpretation of results, and the importance of repeatability of results. In this content analysis, episodes from the show’s 10-year history were methodically examined for aspects of the scientific method
Effective distributed representations for academic expert search
Expert search aims to find and rank experts based on a user's query. In
academia, retrieving experts is an efficient way to navigate through a large
amount of academic knowledge. Here, we study how different distributed
representations of academic papers (i.e. embeddings) impact academic expert
retrieval. We use the Microsoft Academic Graph dataset and experiment with
different configurations of a document-centric voting model for retrieval. In
particular, we explore the impact of the use of contextualized embeddings on
search performance. We also present results for paper embeddings that
incorporate citation information through retrofitting. Additionally,
experiments are conducted using different techniques for assigning author
weights based on author order. We observe that using contextual embeddings
produced by a transformer model trained for sentence similarity tasks produces
the most effective paper representations for document-centric expert retrieval.
However, retrofitting the paper embeddings and using elaborate author
contribution weighting strategies did not improve retrieval performance.Comment: To be published in the Scholarly Document Processing 2020 Workshop @
EMNLP 2020 proceeding
Simulation of a data center cooling system in an emergency situation
The paper deals with keeping server rooms at reasonable air temperature in the case of an electrical power failure in a data center and with building performance simulations used to support emergency power planning. An existing data center was analyzed in detail with respect to the possibilities of emergency cooling. Based on the assumption that the thermal capacity of already chilled water can be used to prolong functionality of the cooling system when the roof chillers are out of operation, a backup power supply was designed for Computer Room Air-Conditioning and even for the cooling liquid circuit pumps (i.e. not for the roof chillers). Special models representing the data center indoor environment and cooling system, including a detailed model of the Computer Room Air Conditioning (CRAC) units, were developed in order to estimate the time period during which the internal air temperatures in the server room will not exceed the limit. The numerical model of the server room and the cooling system was built in the TRNSYS software and calibrated by measured data acquired from a real power outage situation. The results and conclusions obtained from the performed analyses and simulations helped to improve the emergency power plan of the data center. The study also forms the basis for the development of an emergency decision algorithm that will included in the novel supervisory control platform: GENi
- …