9,733 research outputs found

    Data Imputation through the Identification of Local Anomalies

    Get PDF
    We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose i) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and ii) a Maximum A Posteriori (MAP) estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous vs normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions; and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions

    Technocracy inside the rule of law : challenges in the foundations of legal norms

    Get PDF
    Technocracy is usually opposed to democracy. Here, another perspective is taken: technocracy is countered with the rule of law. In trying to understand the contemporary dynamics of the rule of law, two main types of legal systems (in a broad sense) have to be distinguished: firstly, the legal norm, studied by the science of law; secondly, the scientific laws (which includes the legalities of the different sciences and communities). They both contain normative prescriptions. But their differ in their subjects‘ source: while legal norms are the will’s expression of the normative authority, technical prescriptions can be derived from scientific laws, which are grounded over the commonly supposed objectivity of the scientific knowledge about reality. They both impose sanctions too, but in the legal norm they refer to what is established by the norm itself, while in the scientific legality they consist in the reward or the punishment derived from the efficacy or inefficacy to reach the end pursued by the action. The way of legitimation also differs: while legal norms have to have followed the formal procedures and must not have contravened any fundamental right, technical norms‘ validity depend on its theoretical foundations or on its efficacy. Nowadays, scientific knowledge has become and important feature in policy-making. Contradictions can arise between these legal systems. These conflicts are specially grave when the recognition or exercise of fundamental rights is instrumentally used, or when they are violated in order to increase the policies‘ efficacy. A political system is technocratic, when, in case of contradiction, the scientific law finally prevails

    Multidimensional Scaling on Multiple Input Distance Matrices

    Full text link
    Multidimensional Scaling (MDS) is a classic technique that seeks vectorial representations for data points, given the pairwise distances between them. However, in recent years, data are usually collected from diverse sources or have multiple heterogeneous representations. How to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge. In this paper, we first define this new task formally. Then, we propose a new algorithm called Multi-View Multidimensional Scaling (MVMDS) by considering each input distance matrix as one view. Our algorithm is able to learn the weights of views (i.e., distance matrices) automatically by exploring the consensus information and complementary nature of views. Experimental results on synthetic as well as real datasets demonstrate the effectiveness of MVMDS. We hope that our work encourages a wider consideration in many domains where MDS is needed

    Empathy, Simulation, and Neuroscience: A Phenomenological Case Against Simulation Theory

    Get PDF
    In recent years, some simulation theorists have claimed that the discovery of mirror neurons provides empirical support for the position that mind reading is, at some basic level, simulation. The purpose of this essay is to question that claim. I begin by providing brief context for the current mind reading debate and then developing an influential simulationist account of mind reading. I then draw on the works of Edmund Husserl and Edith Stein to develop an alternative, phenomenological account. In conclusion, I offer multiple objections against simulation theory and argue that the empirical evidence mirror neurons offer us does not necessarily support the view that empathy is simulation

    Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

    Get PDF
    Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%

    Rice seed image classiïŹcation based on HOG descriptor with missing values imputation

    Get PDF
    Rice is a primary source of food consumed by almost half of world population. Rice quality mainly depends on the purity of the rice seed. In order to ensure the purity of rice variety, the recognition process is an essential stage. In this paper, we ïŹrstly propose to use histogram of oriented gradient (HOG) descriptor to characterize rice seed images. Since the size of image is totally random and the features extracted by HOG can not be used directly by classiïŹer due to the different dimensions. We apply several imputation methods to ïŹll the missing data for HOG descriptor. The experiment is applied on the VNRICE benchmark dataset to evaluate the proposed approach
    • 

    corecore