1,008 research outputs found
AustArch1: A database of 14C and luminescence ages from archaeological sites in the Australian arid zone
AustArch1 is database of 14C and luminescence ages from archaeological sites in the Australian arid zone.Dataset -- Explanatory Note
Semantic patterns for sentiment analysis of Twitter
Most existing approaches to Twitter sentiment analysis assume that sentiment is explicitly expressed through affective words. Nevertheless, sentiment is often implicitly expressed via latent semantic relations, patterns and dependencies among words in tweets. In this paper, we propose a novel approach that automatically captures patterns of words of similar contextual semantics and sentiment in tweets. Unlike previous work on sentiment pattern extraction, our proposed approach does not rely on external and fixed sets of syntactical templates/patterns, nor requires deep analyses of the syntactic structure of sentences in tweets. We evaluate our approach with tweet- and entity-level sentiment analysis tasks by using the extracted semantic patterns as classification features in both tasks. We use 9 Twitter datasets in our evaluation and compare the performance of our patterns against 6 state-of-the-art baselines. Results show that our patterns consistently outperform all other baselines on all datasets by 2.19% at the tweet-level and 7.5% at the entity-level in average F-measure
Inducing safer oblique trees without costs
Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the
distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification.
Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety.
This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming
Recommended from our members
Development of a Robust 14 C Chronology for Lynch's Crater (North Queensland, Australia) Using Different Pretreatment Strategies
Admixture has obscured signals of historical hard sweeps in humans (advance online)
The role of natural selection in shaping biological diversity is an area of intense interest in modern biology. To date, studies of positive selection have primarily relied on genomic datasets from contemporary populations, which are susceptible to confounding factors associated with complex and often unknown aspects of population history. In particular, admixture between diverged populations can distort or hide prior selection events in modern genomes, though this process is not explicitly accounted for in most selection studies despite its apparent ubiquity in humans and other species. Through analyses of ancient and modern human genomes, we show that previously reported Holocene-era admixture has masked more than 50 historic hard sweeps in modern European genomes. Our results imply that this canonical mode of selection has probably b een underappreciated in the evolutionary history of humans and suggest that our current understanding of the tempo and mode of selection in natural populations may be inaccurat
A comprehensive database of quality-rated fossil ages for Sahul’s Quaternary vertebrates
Published: 19 July 2016The study of palaeo-chronologies using fossil data provides evidence for past ecological and evolutionary processes, and is therefore useful for predicting patterns and impacts of future environmental change. However, the robustness of inferences made from fossil ages relies heavily on both the quantity and quality of available data. We compiled Quaternary non-human vertebrate fossil ages from Sahul published up to 2013. This, the FosSahul database, includes 9,302 fossil records from 363 deposits, for a total of 478 species within 215 genera, of which 27 are from extinct and extant megafaunal species (2,559 records). We also provide a rating of reliability of individual absolute age based on the dating protocols and association between the dated materials and the fossil remains. Our proposed rating system identified 2,422 records with high-quality ages (i.e., a reduction of 74%). There are many applications of the database, including disentangling the confounding influences of hypothetical extinction drivers, better spatial distribution estimates of species relative to palaeo-climates, and potentially identifying new areas for fossil discovery.Marta RodrÃguez-Rey, y, Salvador Herrando-Pérez, Barry W. Brook, Frédérik Saltré, John Alroy, Nicholas Beeton, Michael I. Bird, Alan Cooper, Richard Gillespie, Zenobia Jacobs, Christopher N. Johnson, Gifford H. Miller, Gavin J. Prideaux, Richard G. Roberts, Chris S.M. Turney and Corey J.A. Bradsha
CSNL: A cost-sensitive non-linear decision tree algorithm
This article presents a new decision tree learning algorithm called CSNL that induces Cost-Sensitive Non-Linear decision trees. The algorithm is based on the hypothesis that nonlinear decision nodes provide a better basis than axis-parallel decision nodes and utilizes discriminant analysis to construct nonlinear decision trees that take account of costs of misclassification.
The performance of the algorithm is evaluated by applying it to seventeen datasets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the datasets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using nonlinear decision nodes.
The performance of the algorithm is evaluated by applying it to seventeen data sets and the results are
compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date.
The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the data sets and is considerably faster.
The use of bagging with CSNL further enhances its performance showing the significant benefits of using non-linear decision nodes
Technical note: Optimizing the utility of combined GPR, OSL, and Lidar (GOaL) to extract paleoenvironmental records and decipher shoreline evolution
Records of past sea levels, storms, and their impacts on coastlines are crucial
for forecasting and managing future changes resulting
from anthropogenic global warming. Coastal barriers that have prograded over
the Holocene preserve within their accreting sands a history of storm erosion
and changes in sea level. High-resolution geophysics, geochronology, and
remote sensing techniques offer an optimal way to extract these records and
decipher shoreline evolution. These methods include light detection and
ranging (lidar) to image the lateral extent of relict shoreline dune
morphology in 3-D, ground-penetrating radar (GPR) to record paleo-dune,
beach, and nearshore stratigraphy, and optically stimulated luminescence
(OSL) to date the deposition of sand grains along these shorelines.
Utilization of these technological advances has recently become more
prevalent in coastal research. The resolution and sensitivity of these
methods offer unique insights on coastal environments and their relationship
to past climate change. However, discrepancies in the analysis and
presentation of the data can result in erroneous interpretations. When
utilized correctly on prograded barriers these methods (independently or in
various combinations) have produced storm records, constructed sea-level
curves, quantified sediment budgets, and deciphered coastal evolution.
Therefore, combining the application of GPR, OSL, and Lidar (GOaL) on one
prograded barrier has the potential to generate three detailed records of
(1)Â storms, (2)Â sea level, and (3)Â sediment supply for that coastline.
Obtaining all three for one barrier (a GOaL hat-trick) can provide valuable
insights into how these factors influenced past and future barrier evolution.
Here we argue that systematically achieving GOaL hat-tricks on some of the
300+ prograded barriers worldwide would allow us to disentangle local
patterns of sediment supply from the regional effects of storms or global
changes in sea level, providing for a
direct comparison to climate proxy records. Fully realizing this aim requires
standardization of methods to optimize results. The impetus for this
initiative is to establish a framework for consistent data collection and
analysis that maximizes the potential of GOaL to contribute to climate change
research that can assist coastal communities in mitigating future impacts of
global warming.</p
- …