Search CORE

653 research outputs found

Revisiting Data Complexity Metrics Based on Morphology for Overlap and Imbalance: Snapshot, New Overlap Number of Balls Metrics and Singular Problems Prospect

Author: Arroyo Marta Andrés
Charte David
Fernández Alberto
Herrera Francisco
Pascual-Triana José Daniel
Publication venue
Publication date: 15/07/2020
Field of study

Data Science and Machine Learning have become fundamental assets for companies and research institutions alike. As one of its fields, supervised classification allows for class prediction of new samples, learning from given training data. However, some properties can cause datasets to be problematic to classify. In order to evaluate a dataset a priori, data complexity metrics have been used extensively. They provide information regarding different intrinsic characteristics of the data, which serve to evaluate classifier compatibility and a course of action that improves performance. However, most complexity metrics focus on just one characteristic of the data, which can be insufficient to properly evaluate the dataset towards the classifiers' performance. In fact, class overlap, a very detrimental feature for the classification process (especially when imbalance among class labels is also present) is hard to assess. This research work focuses on revisiting complexity metrics based on data morphology. In accordance to their nature, the premise is that they provide both good estimates for class overlap, and great correlations with the classification performance. For that purpose, a novel family of metrics have been developed. Being based on ball coverage by classes, they are named after Overlap Number of Balls. Finally, some prospects for the adaptation of the former family of metrics to singular (more complex) problems are discussed.Comment: 23 pages, 9 figures, preprin

arXiv.org e-Print Archive

The Economics and Psychology of Personality Traits

Author: Borghans Lex
Heckman James J.
Lee Duckworth Angela
Weel Bas ter
Publication venue
Publication date
Field of study

This paper explores the interface between personality psychology andeconomics. We examine the predictive power of personality and the stability ofpersonality traits over the life cycle. We develop simple analytical frameworksfor interpreting the evidence in personality psychology and suggest promisingavenues for future research.education, training and the labour market;

Research Papers in Economics

The Economics and Psychology of Personality Traits

Author: Borghans Lex
Duckworth Angela Lee
Heckman James J.
ter Weel Bas
Publication venue
Publication date
Field of study

This paper explores the interface between personality psychology and economics. We examine the predictive power of personality and the stability of personality traits over the life cycle. We develop simple analytical frameworks for interpreting the evidence in personality psychology and suggest promising avenues for future research.lifecycle effects, personality traits

Research Papers in Economics

The Economics and Psychology of Personality Traits

Author: Borghans Lex
Duckworth Angela Lee
Heckman James J.
Weel Bas ter
Publication venue
Publication date
Field of study

This paper explores the interface between personality psychology and economics. We examine the predictive power of personality and the stability of personality traits over the life cycle. We develop simple analytical frameworks for interpreting the evidence in personality psychology and suggest promising avenues for future research.personality traits, lifecycle effects, psychology, economics

Research Papers in Economics

The Economics and Psychology of Personality Traits

Author: Angela Lee Duckworth
Bas ter Weel
James J. Heckman
Lex Borghans
Publication venue
Publication date
Field of study

Research Papers in Economics

Introspective knowledge acquisition for case retrieval networks in textual case base reasoning.

Author: Chakraborti Sutanu
Publication venue
Publication date: 31/08/2007
Field of study

Textual Case Based Reasoning (TCBR) aims at effective reuse of information contained in unstructured documents. The key advantage of TCBR over traditional Information Retrieval systems is its ability to incorporate domain-specific knowledge to facilitate case comparison beyond simple keyword matching. However, substantial human intervention is needed to acquire and transform this knowledge into a form suitable for a TCBR system. In this research, we present automated approaches that exploit statistical properties of document collections to alleviate this knowledge acquisition bottleneck. We focus on two important knowledge containers: relevance knowledge, which shows relatedness of features to cases, and similarity knowledge, which captures the relatedness of features to each other. The terminology is derived from the Case Retrieval Network (CRN) retrieval architecture in TCBR, which is used as the underlying formalism in this thesis applied to text classification. Latent Semantic Indexing (LSI) generated concepts are a useful resource for relevance knowledge acquisition for CRNs. This thesis introduces a supervised LSI technique called sprinkling that exploits class knowledge to bias LSI's concept generation. An extension of this idea, called Adaptive Sprinkling has been proposed to handle inter-class relationships in complex domains like hierarchical (e.g. Yahoo directory) and ordinal (e.g. product ranking) classification tasks. Experimental evaluation results show the superiority of CRNs created with sprinkling and AS, not only over LSI on its own, but also over state-of-the-art classifiers like Support Vector Machines (SVM). Current statistical approaches based on feature co-occurrences can be utilized to mine similarity knowledge for CRNs. However, related words often do not co-occur in the same document, though they co-occur with similar words. We introduce an algorithm to efficiently mine such indirect associations, called higher order associations. Empirical results show that CRNs created with the acquired similarity knowledge outperform both LSI and SVM. Incorporating acquired knowledge into the CRN transforms it into a densely connected network. While improving retrieval effectiveness, this has the unintended effect of slowing down retrieval. We propose a novel retrieval formalism called the Fast Case Retrieval Network (FCRN) which eliminates redundant run-time computations to improve retrieval speed. Experimental results show FCRN's ability to scale up over high dimensional textual casebases. Finally, we investigate novel ways of visualizing and estimating complexity of textual casebases that can help explain performance differences across casebases. Visualization provides a qualitative insight into the casebase, while complexity is a quantitative measure that characterizes classification or retrieval hardness intrinsic to a dataset. We study correlations of experimental results from the proposed approaches against complexity measures over diverse casebases

Open Access Institutional Repository at Robert Gordon University

Dataset analysis for classifier ensemble enhancement

Author
Publication venue: Università degli Studi di Cagliari
Publication date: 27/04/2015
Field of study

We developed three different methods for dataset analysis and ensemble enhance- ment. They share the underlying idea that an accurate preprocessing and adap- tation of the data can improve the system performance, without changing the classification model. Correlation Score is a generic framework for assessing encoding techniques by measuring the correlation between the encoded feature vectors and the corresponding class labels; experiments show its effectiveness in discovering the best encoding configurations between those tested, on a wide range of classification domains. Multi-Resolution Complexity Analysis is a method for assessing the local complexity inside a given domain. It is able to split a domain into regions of different classification complexity, giving insights on the inner structure of the populations inside the domain. Finally, Forests of Local Trees are a novel training algorithm for ensemble classifiers. They are based on the concept of local trees: classifiers trained with a bias toward a certain region of the domain. This bias enhances the diversity inside the ensemble, leading to improved performance. These three topics are meant as a foundation for a more complex framework, that will eventually utilize them organically

Archivio istituzionale della ricerca - Università di Cagliari

Personality Psychology and Economics

Author: Almlund Mathilde
Duckworth Angela Lee
Heckman James J.
Kautz Tim
Publication venue
Publication date
Field of study

This paper explores the power of personality traits both as predictors and as causes of academic and economic success, health, and criminal activity. Measured personality is interpreted as a construct derived from an economic model of preferences, constraints, and information. Evidence is reviewed about the "situational specificity" of personality traits and preferences. An extreme version of the situationist view claims that there are no stable personality traits or preference parameters that persons carry across different situations. Those who hold this view claim that personality psychology has little relevance for economics. The biological and evolutionary origins of personality traits are explored. Personality measurement systems and relationships among the measures used by psychologists are examined. The predictive power of personality measures is compared with the predictive power of measures of cognition captured by IQ and achievement tests. For many outcomes, personality measures are just as predictive as cognitive measures, even after controlling for family background and cognition. Moreover, standard measures of cognition are heavily influenced by personality traits and incentives. Measured personality traits are positively correlated over the life cycle. However, they are not fixed and can be altered by experience and investment. Intervention studies, along with studies in biology and neuroscience, establish a causal basis for the observed effect of personality traits on economic and social outcomes. Personality traits are more malleable over the life cycle compared to cognition, which becomes highly rank stable around age 10. Interventions that change personality are promising avenues for addressing poverty and disadvantage.personality, behavioral economics, cognitive traits, wages, economic success, human development, person-situation debate

Research Papers in Economics

Ontological foundations for structural conceptual models

Author: Guizzardi Giancarlo
Publication venue: CTIT, Centre for Telematics and Information Technology
Publication date: 01/01/2005
Field of study

In this thesis, we aim at contributing to the theory of conceptual modeling and ontology representation. Our main objective here is to provide ontological foundations for the most fundamental concepts in conceptual modeling. These foundations comprise a number of ontological theories, which are built on established work on philosophical ontology, cognitive psychology, philosophy of language and linguistics. Together these theories amount to a system of categories and formal relations known as a foundational ontolog

University of Twente Research Information

Machine Learning for Enhanced Maritime Situation Awareness: Leveraging Historical AIS Data for Ship Trajectory Prediction

Author: Murray Brian
Publication venue: 'ASME International'
Publication date: 03/05/2021
Field of study

In this thesis, methods to support high level situation awareness in ship navigators through appropriate automation are investigated. Situation awareness relates to the perception of the environment (level 1), comprehension of the situation (level 2), and projection of future dynamics (level 3). Ship navigators likely conduct mental simulations of future ship traffic (level 3 projections), that facilitate proactive collision avoidance actions. Such actions may include minor speed and/or heading alterations that can prevent future close-encounter situations from arising, enhancing the overall safety of maritime operations. Currently, there is limited automation support for level 3 projections, where the most common approaches utilize linear predictions based on constant speed and course values. Such approaches, however, are not capable of predicting more complex ship behavior. Ship navigators likely facilitate such predictions by developing models for level 3 situation awareness through experience. It is, therefore, suggested in this thesis to develop methods that emulate the development of high level human situation awareness. This is facilitated by leveraging machine learning, where navigational experience is artificially represented by historical AIS data. First, methods are developed to emulate human situation awareness by developing categorization functions. In this manner, historical ship behavior is categorized to reflect distinct patterns. To facilitate this, machine learning is leveraged to generate meaningful representations of historical AIS trajectories, and discover clusters of specific behavior. Second, methods are developed to facilitate pattern matching of an observed trajectory segment to clusters of historical ship behavior. Finally, the research in this thesis presents methods to predict future ship behavior with respect to a given cluster. Such predictions are, furthermore, on a scale intended to support proactive collision avoidance actions. Two main approaches are used to facilitate these functions. The first utilizes eigendecomposition-based approaches via locally extracted AIS trajectory segments. Anomaly detection is also facilitated via this approach in support of the outlined functions. The second utilizes deep learning-based approaches applied to regionally extracted trajectories. Both approaches are found to be successful in discovering clusters of specific ship behavior in relevant data sets, classifying a trajectory segment to a given cluster or clusters, as well as predicting the future behavior. Furthermore, the local ship behavior techniques can be trained to facilitate live predictions. The deep learning-based techniques, however, require significantly more training time. These models will, therefore, need to be pre-trained. Once trained, however, the deep learning models will facilitate almost instantaneous predictions

Munin - Open Research Archive