259 research outputs found
Labeling Neural Representations with Inverse Recognition
Deep Neural Networks (DNNs) demonstrate remarkable capabilities in learning
complex hierarchical data representations, but the nature of these
representations remains largely unknown. Existing global explainability
methods, such as Network Dissection, face limitations such as reliance on
segmentation masks, lack of statistical significance testing, and high
computational demands. We propose Inverse Recognition (INVERT), a scalable
approach for connecting learned representations with human-understandable
concepts by leveraging their capacity to discriminate between these concepts.
In contrast to prior work, INVERT is capable of handling diverse types of
neurons, exhibits less computational complexity, and does not rely on the
availability of segmentation masks. Moreover, INVERT provides an interpretable
metric assessing the alignment between the representation and its corresponding
explanation and delivering a measure of statistical significance. We
demonstrate the applicability of INVERT in various scenarios, including the
identification of representations affected by spurious correlations, and the
interpretation of the hierarchical structure of decision-making within the
models.Comment: 25 pages, 16 figure
A Unifying View of Multiple Kernel Learning
Recent research on multiple kernel learning has lead to a number of
approaches for combining kernels in regularized risk minimization. The proposed
approaches include different formulations of objectives and varying
regularization strategies. In this paper we present a unifying general
optimization criterion for multiple kernel learning and show how existing
formulations are subsumed as special cases. We also derive the criterion's dual
representation, which is suitable for general smooth optimization algorithms.
Finally, we evaluate multiple kernel learning in this framework analytically
using a Rademacher complexity bound on the generalization error and empirically
in a set of experiments
Models of asthma: density-equalizing mapping and output benchmarking
Despite the large amount of experimental studies already conducted on bronchial asthma, further insights into the molecular basics of the disease are required to establish new therapeutic approaches. As a basis for this research different animal models of asthma have been developed in the past years. However, precise bibliometric data on the use of different models do not exist so far. Therefore the present study was conducted to establish a data base of the existing experimental approaches. Density-equalizing algorithms were used and data was retrieved from a Thomson Institute for Scientific Information database. During the period from 1900 to 2006 a number of 3489 filed items were connected to animal models of asthma, the first being published in the year 1968. The studies were published by 52 countries with the US, Japan and the UK being the most productive suppliers, participating in 55.8% of all published items. Analyzing the average citation per item as an indicator for research quality Switzerland ranked first (30.54/item) and New Zealand ranked second for countries with more than 10 published studies. The 10 most productive journals included 4 with a main focus allergy and immunology and 4 with a main focus on the respiratory system. Two journals focussed on pharmacology or pharmacy. In all assigned subject categories examined for a relation to animal models of asthma, immunology ranked first. Assessing numbers of published items in relation to animal species it was found that mice were the preferred species followed by guinea pigs. In summary it can be concluded from density-equalizing calculations that the use of animal models of asthma is restricted to a relatively small number of countries. There are also differences in the use of species. These differences are based on variations in the research focus as assessed by subject category analysis
Machine Learning Models that Remember Too Much
Machine learning (ML) is becoming a commodity. Numerous ML frameworks and
services are available to data holders who are not ML experts but want to train
predictive models on their data. It is important that ML models trained on
sensitive inputs (e.g., personal images or documents) not leak too much
information about the training data.
We consider a malicious ML provider who supplies model-training code to the
data holder, does not observe the training, but then obtains white- or
black-box access to the resulting model. In this setting, we design and
implement practical algorithms, some of them very similar to standard ML
techniques such as regularization and data augmentation, that "memorize"
information about the training dataset in the model yet the model is as
accurate and predictive as a conventionally trained model. We then explain how
the adversary can extract memorized information from the model.
We evaluate our techniques on standard ML tasks for image classification
(CIFAR10), face recognition (LFW and FaceScrub), and text analysis (20
Newsgroups and IMDB). In all cases, we show how our algorithms create models
that have high predictive power yet allow accurate extraction of subsets of
their training data
Scoliosis: density-equalizing mapping and scientometric analysis
<p>Abstract</p> <p>Background</p> <p>Publications related to scoliosis have increased enormously. A differentiation between publications of major and minor importance has become difficult even for experts. Scientometric data on developments and tendencies in scoliosis research has not been available to date. The aim of the current study was to evaluate the scientific efforts of scoliosis research both quantitatively and qualitatively.</p> <p>Methods</p> <p>Large-scale data analysis, density-equalizing algorithms and scientometric methods were used to evaluate both the quantity and quality of research achievements of scientists studying scoliosis. Density-equalizing algorithms were applied to data retrieved from ISI-Web.</p> <p>Results</p> <p>From 1904 to 2007, 8,186 items pertaining to scoliosis were published and included in the database. The studies were published in 76 countries: the USA, the U.K. and Canada being the most productive centers. The Washington University (St. Louis, Missouri) was identified as the most prolific institution during that period, and orthopedics represented by far the most productive medical discipline. "BRADFORD, DS" is the most productive author (146 items), and "DANSEREAU, J" is the author with the highest scientific impact (h-index of 27).</p> <p>Conclusion</p> <p>Our results suggest that currently established measures of research output (i.e. impact factor, h-index) should be evaluated critically because phenomena, such as self-citation and co-authorship, distort the results and limit the value of the conclusions that may be drawn from these measures. Qualitative statements are just tractable by the comparison of the parameters with respect to multiple linkages. In order to obtain more objective evaluation tools, new measurements need to be developed.</p
Density-equalizing mapping and scientometric benchmarking of European allergy research
Due to the great socioeconomic burden of allergic diseases, research in this field which is important for environmental medicine is currently increasing. Therefore the European Union has initiated the Global Allergy and Asthma European network (GA2LEN). However, despite increasing research in the past years detailed scientometric analyses have not been conducted so far. This study is the first scientometric analysis in a field of growing interest. It analyses scientific contributions in European allergy research between 2001 and 2007. Three different meetings of the European Academy of Allergy and Clinical Immunology were analysed for contributions and an increase in both the amount of research and networks was found
Security Evaluation of Support Vector Machines in Adversarial Environments
Support Vector Machines (SVMs) are among the most popular classification
techniques adopted in security applications like malware detection, intrusion
detection, and spam filtering. However, if SVMs are to be incorporated in
real-world security systems, they must be able to cope with attack patterns
that can either mislead the learning algorithm (poisoning), evade detection
(evasion), or gain information about their internal parameters (privacy
breaches). The main contributions of this chapter are twofold. First, we
introduce a formal general framework for the empirical evaluation of the
security of machine-learning systems. Second, according to our framework, we
demonstrate the feasibility of evasion, poisoning and privacy attacks against
SVMs in real-world security problems. For each attack technique, we evaluate
its impact and discuss whether (and how) it can be countered through an
adversary-aware design of SVMs. Our experiments are easily reproducible thanks
to open-source code that we have made available, together with all the employed
datasets, on a public repository.Comment: 47 pages, 9 figures; chapter accepted into book 'Support Vector
Machine Applications
Scientometric Analysis and Combined Density-Equalizing Mapping of Environmental Tobacco Smoke (ETS) Research
Background: Passive exposure to environmental tobacco smoke (ETS) is estimated to exert a major burden of disease. Currently, numerous countries have taken legal actions to protect the population against ETS. Numerous studies have been conducted in this field. Therefore, scientometric methods should be used to analyze the accumulated data since there is no such approach available so far. Methods and Results: A combination of scientometric methods and novel visualizing procedures were used, including density-equalizing mapping and radar charting techniques. 6,580 ETS-related studies published between 1900 and 2008 were identified in the ISI database. Using different scientometric approaches, a continuous increase of both quantitative and qualitative parameters was found. The combination with density-equalizing calculations demonstrated a leading position of the United States (2,959 items published) in terms of quantitative research activities. Charting techniques demonstrated that there are numerous bi- and multilateral networks between different countries and institutions in this field. Again, a leading position of American institutions was found. Conclusions: This is the first comprehensive scientometric analysis of data on global scientific activities in the field o
- …