9,733 research outputs found
Data Imputation through the Identification of Local Anomalies
We introduce a comprehensive and statistical framework in a model free
setting for a complete treatment of localized data corruptions due to severe
noise sources, e.g., an occluder in the case of a visual recording. Within this
framework, we propose i) a novel algorithm to efficiently separate, i.e.,
detect and localize, possible corruptions from a given suspicious data instance
and ii) a Maximum A Posteriori (MAP) estimator to impute the corrupted data. As
a generalization to Euclidean distance, we also propose a novel distance
measure, which is based on the ranked deviations among the data attributes and
empirically shown to be superior in separating the corruptions. Our algorithm
first splits the suspicious instance into parts through a binary partitioning
tree in the space of data attributes and iteratively tests those parts to
detect local anomalies using the nominal statistics extracted from an
uncorrupted (clean) reference data set. Once each part is labeled as anomalous
vs normal, the corresponding binary patterns over this tree that characterize
corruptions are identified and the affected attributes are imputed. Under a
certain conditional independency structure assumed for the binary patterns, we
analytically show that the false alarm rate of the introduced algorithm in
detecting the corruptions is independent of the data and can be directly set
without any parameter tuning. The proposed framework is tested over several
well-known machine learning data sets with synthetically generated corruptions;
and experimentally shown to produce remarkable improvements in terms of
classification purposes with strong corruption separation capabilities. Our
experiments also indicate that the proposed algorithms outperform the typical
approaches and are robust to varying training phase conditions
Technocracy inside the rule of law : challenges in the foundations of legal norms
Technocracy is usually opposed to democracy. Here, another perspective is taken: technocracy is countered with the rule of law. In trying to understand the contemporary dynamics of the rule of law, two main types of legal systems (in a broad sense) have to be distinguished: firstly, the legal norm, studied by the science of law; secondly, the scientific laws (which includes the legalities of the different sciences and communities). They both contain normative prescriptions. But their differ in their subjectsâ source: while legal norms are the willâs expression of the normative authority, technical prescriptions can be derived from scientific laws, which are grounded over the commonly supposed objectivity of the scientific knowledge about reality. They both impose sanctions too, but in the legal norm they refer to what is established by the norm itself, while in the scientific legality they consist in the reward or the punishment derived from the efficacy or inefficacy to reach the end pursued by the action. The way of legitimation also differs: while legal norms have to have followed the formal procedures and must not have contravened any fundamental right, technical normsâ validity depend on its theoretical foundations or on its efficacy. Nowadays, scientific knowledge has become and important feature in policy-making. Contradictions can arise between these legal systems. These conflicts are specially grave when the recognition or exercise of fundamental rights is instrumentally used, or when they are violated in order to increase the policiesâ efficacy. A political system is technocratic, when, in case of contradiction, the scientific law finally prevails
Multidimensional Scaling on Multiple Input Distance Matrices
Multidimensional Scaling (MDS) is a classic technique that seeks vectorial
representations for data points, given the pairwise distances between them.
However, in recent years, data are usually collected from diverse sources or
have multiple heterogeneous representations. How to do multidimensional scaling
on multiple input distance matrices is still unsolved to our best knowledge. In
this paper, we first define this new task formally. Then, we propose a new
algorithm called Multi-View Multidimensional Scaling (MVMDS) by considering
each input distance matrix as one view. Our algorithm is able to learn the
weights of views (i.e., distance matrices) automatically by exploring the
consensus information and complementary nature of views. Experimental results
on synthetic as well as real datasets demonstrate the effectiveness of MVMDS.
We hope that our work encourages a wider consideration in many domains where
MDS is needed
Empathy, Simulation, and Neuroscience: A Phenomenological Case Against Simulation Theory
In recent years, some simulation theorists have claimed that the discovery of mirror neurons provides empirical support for the position that mind reading is, at some basic level, simulation. The purpose of this essay is to question that claim. I begin by providing brief context for the current mind reading debate and then developing an influential simulationist account of mind reading. I then draw on the works of Edmund Husserl and Edith Stein to develop an alternative, phenomenological account. In conclusion, I offer multiple objections against simulation theory and argue that the empirical evidence mirror neurons offer us does not necessarily support the view that empathy is simulation
Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation
Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%
Rice seed image classiïŹcation based on HOG descriptor with missing values imputation
Rice is a primary source of food consumed by almost half of world population. Rice quality mainly depends on the purity of the rice seed. In order to ensure the purity of rice variety, the recognition process is an essential stage. In this paper, we ïŹrstly propose to use histogram of oriented gradient (HOG) descriptor to characterize rice seed images. Since the size of image is totally random and the features extracted by HOG can not be used directly by classiïŹer due to the different dimensions. We apply several imputation methods to ïŹll the missing data for HOG descriptor. The experiment is applied on the VNRICE benchmark dataset to evaluate the proposed approach
- âŠ