17,475 research outputs found
Latent Gaussian modeling and INLA: A review with focus on space-time applications
Bayesian hierarchical models with latent Gaussian layers have proven very
flexible in capturing complex stochastic behavior and hierarchical structures
in high-dimensional spatial and spatio-temporal data. Whereas simulation-based
Bayesian inference through Markov Chain Monte Carlo may be hampered by slow
convergence and numerical instabilities, the inferential framework of
Integrated Nested Laplace Approximation (INLA) is capable to provide accurate
and relatively fast analytical approximations to posterior quantities of
interest. It heavily relies on the use of Gauss-Markov dependence structures to
avoid the numerical bottleneck of high-dimensional nonsparse matrix
computations. With a view towards space-time applications, we here review the
principal theoretical concepts, model classes and inference tools within the
INLA framework. Important elements to construct space-time models are certain
spatial Mat\'ern-like Gauss-Markov random fields, obtained as approximate
solutions to a stochastic partial differential equation. Efficient
implementation of statistical inference tools for a large variety of models is
available through the INLA package of the R software. To showcase the practical
use of R-INLA and to illustrate its principal commands and syntax, a
comprehensive simulation experiment is presented using simulated non Gaussian
space-time count data with a first-order autoregressive dependence structure in
time
Descriptions of new species of the New World genus Perilypus Spinola (Coleoptera: Cleridae: Clerinae)
Thirty-two new species of Perilypus Spinola (Coleoptera: Cleridae: Clerinae) are described; they are Perilypus ancorus, P. angustatus, P. aquilus, P. arenaceus, P. caligneus, P. cartagoensis, P. collatus, P. comosus, P. concisus, P. copanensis, P. copiosus, P. diutius, P. divaricatus, P. elimatus, P. flavoapicalis, P. galenae, P. hamus, P. hornito, P. infussus, P. iodus, P. lateralis, P. latissimus, P. licinus, P. limbus, P. miculus, P. odous, P. orophus, P. patulus, P. punctus, turnbowi, P. violaceus, and P. yasuniensis. Included in this work are 58 line drawings and 32 color habitus photographs of primary types. To facilitate species identification the species included herein are linked to a key to Perilypus species provided in a previous review of the genus
Balcus violaceus (Fabricius) : senior synonym of Balcus niger Sharp and B. signatus Broun (Coleoptera: Cleridae: Clerinae)
The elytra of Balcus signatus Broun (Coleoptera: Cleridae: Clerinae) from New Zealand have pale markings. Such markings, most prominently found in females, represent intraspecific variations of Balcus violaceus (Fabricius). Accordingly, Balcus signatus Brown is synonymized with Notoxus violaceus Fabricius, new synonymy. Four habitus figures of Balcus violaceus (Fabricius) are presented to display the range of elytral color variation in the species
New taxa of Epiphloeinae Kuwert (Cleridae) and Chaetosomatidae Crowson (Coleoptera: Cleroidea)
Twenty-one new taxa of Cleridae and one of Chaetosomatidae are described including four new genera: Acanthocollis, Decaphloeus, Megaphloeus, and Stegnoclava. Twenty new species are described: five species of Amboakis Opitz (A. ampla, A. antegalba, A. diffusa, A. demagna, A. waodani, one species of Epiphloeus Spinola (E. erwini), four species of Madoniella Pic (M. aspera, M. darlingtoni, M. divida, M. spilota), two species of Plocamocera Spinola (P. clinata, P. lena), seven species of Pyticeroides Kuwert (P. latisentis, P. moraquesi, P. parvoporis, P. pinnacerinis, P. pullis, P. turbosiris, P. ustulatis), and one species of Chaetosomatidae (Chaetosoma colossa)
Modeling asymptotically independent spatial extremes based on Laplace random fields
We tackle the modeling of threshold exceedances in asymptotically independent
stochastic processes by constructions based on Laplace random fields. These are
defined as Gaussian random fields scaled with a stochastic variable following
an exponential distribution. This framework yields useful asymptotic properties
while remaining statistically convenient. Univariate distribution tails are of
the half exponential type and are part of the limiting generalized Pareto
distributions for threshold exceedances. After normalizing marginal tail
distributions in data, a standard Laplace field can be used to capture spatial
dependence among extremes. Asymptotic properties of Laplace fields are explored
and compared to the classical framework of asymptotic dependence. Multivariate
joint tail decay rates for Laplace fields are slower than for Gaussian fields
with the same covariance structure; hence they provide more conservative
estimates of very extreme joint risks while maintaining asymptotic
independence. Statistical inference is illustrated on extreme wind gusts in the
Netherlands where a comparison to the Gaussian dependence model shows a better
goodness-of-fit in terms of Akaike's criterion. In this specific application we
fit the well-adapted Weibull distribution as univariate tail model, such that
the normalization of univariate tail distributions can be done through a simple
power transformation of data
Automatic Accuracy Prediction for AMR Parsing
Abstract Meaning Representation (AMR) represents sentences as directed,
acyclic and rooted graphs, aiming at capturing their meaning in a machine
readable format. AMR parsing converts natural language sentences into such
graphs. However, evaluating a parser on new data by means of comparison to
manually created AMR graphs is very costly. Also, we would like to be able to
detect parses of questionable quality, or preferring results of alternative
systems by selecting the ones for which we can assess good quality. We propose
AMR accuracy prediction as the task of predicting several metrics of
correctness for an automatically generated AMR parse - in absence of the
corresponding gold parse. We develop a neural end-to-end multi-output
regression model and perform three case studies: firstly, we evaluate the
model's capacity of predicting AMR parse accuracies and test whether it can
reliably assign high scores to gold parses. Secondly, we perform parse
selection based on predicted parse accuracies of candidate parses from
alternative systems, with the aim of improving overall results. Finally, we
predict system ranks for submissions from two AMR shared tasks on the basis of
their predicted parse accuracy averages. All experiments are carried out across
two different domains and show that our method is effective.Comment: accepted at *SEM 201
Popular Ensemble Methods: An Empirical Study
An ensemble consists of a set of individually trained classifiers (such as
neural networks or decision trees) whose predictions are combined when
classifying novel instances. Previous research has shown that an ensemble is
often more accurate than any of the single classifiers in the ensemble. Bagging
(Breiman, 1996c) and Boosting (Freund and Shapire, 1996; Shapire, 1990) are two
relatively new but popular methods for producing ensembles. In this paper we
evaluate these methods on 23 data sets using both neural networks and decision
trees as our classification algorithm. Our results clearly indicate a number of
conclusions. First, while Bagging is almost always more accurate than a single
classifier, it is sometimes much less accurate than Boosting. On the other
hand, Boosting can create ensembles that are less accurate than a single
classifier -- especially when using neural networks. Analysis indicates that
the performance of the Boosting methods is dependent on the characteristics of
the data set being examined. In fact, further results show that Boosting
ensembles may overfit noisy data sets, thus decreasing its performance.
Finally, consistent with previous studies, our work suggests that most of the
gain in an ensemble's performance comes in the first few classifiers combined;
however, relatively large gains can be seen up to 25 classifiers when Boosting
decision trees
Prediction of available computing capacities for a more efficient use of Grid resources
Vor allem in der Forschung und in den Entwicklungsabteilungen von Unternehmen gibt es eine Vielzahl von Problemen, welche nur mit Programmen zu lösen sind, für deren Ausführung die zur Verfügung stehende Rechenleistung kaum groß genug sein kann. Gleichzeitig ist zu beobachten, dass ein großer Teil der mit der installierten Rechentechnik vorhandenen Rechenkapazität nicht ausgenutzt wird. Dies gilt insbesondere für Einzelrechner, die in Büros, Computer-Pools oder Privathaushalten stehen und sogar während ihrer eigentlichen Nutzung selten ausgelastet sind. Eines der Ziele des Grid-Computings besteht darin, solche nicht ausgelasteten Ressourcen für rechenintensive Anwendungen zur Verfügung zu stellen. Die eigentliche Motivation für die beabsichtigte bessere Auslastung der Ressourcen liegt dabei nicht primär in der höhreren Auslastung, sondern in einer möglichen Einsparung von Kosten gegenüber der Alternative der Neuanschaffung weiterer Hardware. Ein erster Beitrag der vorliegenden Arbeit liegt in der Analyse und Quantifizierung dieses möglichen Kostenvorteils. Zu diesem Zweck werden die relevanten Kosten betrachtet und schließlich verschiedene Szenarien miteinander verglichen. Die Analyse wird schließlich konkrete Zahlen zu den Kosten in den verschiedenen Szenarien liefern und somit das mögliche Potential zur Kosteneinsparung bei der Nutzung brach liegender Rechenkapazitäten aufzeigen. Ein wesentliches Problem beim Grid-Computing besteht jedoch (vor allem bei der Nutzung von Einzelrechnern zur Ausführung länger laufender Programme) darin, dass die zur Verfügung stehenden freien Rechenkapazitäten im zeitlichen Verlauf stark schwanken und Berechnungsfortschritte durch plötzliche anderweitige Verwendung bzw. durch Abschalten der Rechner verloren gehen. Um dennoch auch Einzelrechner sinnvoll für die Ausführung länger laufender Jobs nutzen zu können, wären Vorhersagen der in der nächsten Zeit zu erwartenden freien Rechenkapazitäten wünschenswert. Solche Vorhersagen könnten u. a. hilfreich sein für das Scheduling und für die Bestimmung geeigneter Checkpoint-Zeitpunkte. Für die genannten Anwendungszwecke sind dabei Punktvorhersagen (wie z. B. Vorhersagen des Erwartungswertes) nur bedingt hilfreich, weshalb sich die vorliegende Arbeit ausschließlich mit Vorhersagen der Wahrscheinlichkeitsverteilungen beschäftigt. Wie solche Vorhersagen erstellt werden sollen, ist Gegenstand der restlichen Arbeit. Dabei werden zunächst Möglichkeiten der Bewertung von Prognoseverfahren diskutiert, die Wahrscheinlichkeitsverteilungen vorhersagen. Es werden wesentliche Probleme bisheriger Bewertungsverfahren aufgezeigt und entsprechende Lösungsvorschläge gemacht. Unter Nutzung dieser werden in der Literatur zu findende und auch neue Vorgehensweisen zur Prognoseerstellung empirisch miteinander verglichen. Es wird sich zeigen, dass eine der neu entwickelten Vorgehensweisen im Vergleich zu bisher in der Literatur dargestellten Vorhersageverfahren klare Vorteile bzgl. der Genauigkeit der Prognosen erzielt.Although computer hardware is getting faster and faster, the available computing capacity is not satisfying for all problem types. Especially in research and development departments the demand for computing power is nearly unlimited. On the same time, there are really large amounts of computing capacities being idle. Such idle capacities can be found in computer pools, on office workstations, or even on home PCs, which are rarely fully utilized. Consequently, one of the goals of the so called “grid computing” is the use of underutilized resources for the execution of compute-intensive tasks. The original motivation behind this idea is not primarily the high utilization of all resources. Instead, the goal is a reduction of costs in comparison to classical usage scenarios. Hence, a first contribution of the thesis at hand is the analysis of the potential cost advantage. The analysis quantifies the relevant cost factors and compares different usage scenarios. It finally delivers tangible figures about the arising costs and, consequently, also about the potential cost savings when using underutilized resources. However, the realization of the potential cost savings is hampered by the variability of the available computing capacities. The progress of a computational process can be slowed down or even lost by sudden increments of the resource utilization or (even worse) by shutdowns or crashes. Obviously, accurate predictions of the future available computing capacities could alleviate the mentioned problem. Such predictions were useful for several purposes (e.g. scheduling or optimization of checkpoint intervals), whereas in most cases the prediction of a single value (for example the expectancy) is only of limited value. Therefore, the work at hand examines predictions of probability distributions. First, the problem of the assessment of different prediction methods is extensively discussed. The main problems of existing assessment criteria are clearly identified, and more useful criteria are proposed. Second, the problem of the prediction itself is analyzed. For this purpose, conventional methods as described in the literature are examined and finally enhanced. The modified methods are then compared to the conventional methods by using the proposed assessment criteria and data from real world computers. The results clearly show the advantage of the methods proposed in the thesis at hand
Export promotion in the context of technical cooperation: Reorientation and an alternative advisory approach
A large number of developing countries have introduced trade policy reforms in recent years. These have been supported by export promotion projects in the context of bilateral and multilateral development cooperation. The results of the advisory approach so far adopted have been disappointing on the whole, so a reorientation is now taking place in the design of export promotion. This article describes an alternative advisory approach by taking the Indo-German Export Promotion Project (IGEP) as an illustration
- …
