1,779 research outputs found
Phylogeography of western Mediterranean Cymbalaria (Plantaginaceae) reveals two independent long-distance dispersals and entails new taxonomic circumscriptions
The Balearic Islands, Corsica and Sardinia (BCS) constitute biodiversity hotspots in the western Mediterranean Basin. Oligocene connections and long distance dispersal events have been suggested to cause presence of BCS shared endemic species. One of them is Cymbalaria aequitriloba, which, together with three additional species, constitute a polyploid clade endemic to BCS. Combining amplified fragment length polymorphism (AFLP) fingerprinting, plastid DNA sequences and morphometrics, we inferred the phylogeography of the group and evaluated the species' current taxonomic circumscriptions. Based on morphometric and AFLP data we propose a new circumscription for C. fragilis to additionally comprise a group of populations with intermediate morphological characters previously included in C. aequitriloba. Consequently, we suggest to change the IUCN category of C. fragilis from critically endangered (CR) to near threatened (NT). Both morphology and AFLP data support the current taxonomy of the single island endemics C. hepaticifolia and C. muelleri. The four species had a common origin in Corsica-Sardinia, and two long-distance dispersal events to the Balearic Islands were inferred. Finally, plastid DNA data suggest that interspecific gene flow took place where two species co-occur
Competent Program Evolution, Doctoral Dissertation, December 2006
Heuristic optimization methods are adaptive when they sample problem solutions based on knowledge of the search space gathered from past sampling. Recently, competent evolutionary optimization methods have been developed that adapt via probabilistic modeling of the search space. However, their effectiveness requires the existence of a compact problem decomposition in terms of prespecified solution parameters. How can we use these techniques to effectively and reliably solve program learning problems, given that program spaces will rarely have compact decompositions? One method is to manually build a problem-specific representation that is more tractable than the general space. But can this process be automated? My thesis is that the properties of programs and program spaces can be leveraged as inductive bias to reduce the burden of manual representation-building, leading to competent program evolution. The central contributions of this dissertation are a synthesis of the requirements for competent program evolution, and the design of a procedure, meta-optimizing semantic evolutionary search (MOSES), that meets these requirements. In support of my thesis, experimental results are provided to analyze and verify the effectiveness of MOSES, demonstrating scalability and real-world applicability
Predicting Exporters with Machine Learning
In this contribution, we exploit machine learning techniques to predict
out-of-sample firms' ability to export based on the financial accounts of both
exporters and non-exporters. Therefore, we show how forecasts can be used as
exporting scores, i.e., to measure the distance of non-exporters from export
status. For our purpose, we train and test various algorithms on the financial
reports of 57,021 manufacturing firms in France in 2010-2018. We find that a
Bayesian Additive Regression Tree with Missingness In Attributes (BART-MIA)
performs better than other techniques with a prediction accuracy of up to
. Predictions are robust to changes in definitions of exporters and in
the presence of discontinuous exporters. Eventually, we argue that exporting
scores can be helpful for trade promotion, trade credit, and to assess firms'
competitiveness. For example, back-of-the-envelope estimates show that a
representative firm with just below-average exporting scores needs up to
more cash resources and up to times more capital expenses to reach full
export status.Comment: 40 pages, 10 figure
Tree models: a Bayesian perspective
Submitted in partial fulfilment of the requirements for the degree of Master of Philosophy at Queen Mary, University of London, November 2006Classical tree models represent an attempt to create nonparametric models which
have good predictive powers as well a simple structure readily comprehensible by non-
experts. Bayesian tree models have been created by a team consisting of Chipman,
George and McCulloch and second team consisting of Denison, Mallick and Smith.
Both approaches employ Green's Reversible Jump Markov Chain Monte Carlo tech-
nique to carry out a more e®ective search than the `greedy' methods used classically.
The aim of this work is to evaluate both types of Bayesian tree models from a
Bayesian perspective and compare them
A survey of methods for explaining black box models
In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective
A Survey Of Methods For Explaining Black Box Models
In the last years many accurate decision support systems have been
constructed as black boxes, that is as systems that hide their internal logic
to the user. This lack of explanation constitutes both a practical and an
ethical issue. The literature reports many approaches aimed at overcoming this
crucial weakness sometimes at the cost of scarifying accuracy for
interpretability. The applications in which black box decision systems can be
used are various, and each approach is typically developed to provide a
solution for a specific problem and, as a consequence, delineating explicitly
or implicitly its own definition of interpretability and explanation. The aim
of this paper is to provide a classification of the main problems addressed in
the literature with respect to the notion of explanation and the type of black
box system. Given a problem definition, a black box type, and a desired
explanation this survey should help the researcher to find the proposals more
useful for his own work. The proposed classification of approaches to open
black box models should also be useful for putting the many research open
questions in perspective.Comment: This work is currently under review on an international journa
Credit scoring using genetic programming
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsGrowing numbers in e-commerce orders lead to an increase in risk management to prevent default in payment. Default in payment is the failure of a customer to settle a bill within 90 days upon receipt. Frequently, credit scoring is employed to identify customers’ default probability. Credit scoring has been widely studied and many different methods in different fields of research have been proposed.
The primary aim of this work is to develop a credit scoring model as a replacement for the pre risk check of the e-commerce risk management system risk solution services (rss). The pre risk check uses data of the order process and includes exclusion rules and a generic credit scoring model. The new model is supposed to work as a replacement for the whole pre risk check and has to be able to work in solitary and in unison with the rss main risk check. An application of Genetic Programming to credit scoring is presented. The model is developed on a real world data set provided by Arvato Financial Solutions. The data set contains order requests processed by rss. Results show that Genetic Programming outperforms the generic credit scoring model of the pre risk check in both classification accuracy and profit. Compared with Logistic Regression, Support Vector Machines and Boosted Trees,
Genetic Programming achieved a similar classificatory accuracy. Furthermore, the Genetic Programming model can be used in combination with the rss main risk check in order to create a model with higher discriminatory power than its individual models
Classification and Scoring of Protein Complexes
Proteins interactions mediate all biological systems in a cell; understanding their interactions
means understanding the processes responsible for human life. Their structure can
be obtained experimentally, but such processes frequently fail at determining structures
of protein complexes. To address the issue, computational methods have been developed
that attempt to predict the structure of a protein complex, using information of its constituents.
These methods, known as docking, generate thousands of possible poses for
each complex, and require effective and reliable ways to quickly discriminate the correct
pose among the set of incorrect ones. In this thesis, a new scoring function was developed
that uses machine learning techniques and features extracted from the structure of the
interacting proteins, to correctly classify and rank the putative poses. The developed
function has shown to be competitive with current state-of-the-art solutions
- …