3,074 research outputs found
Local Rule-Based Explanations of Black Box Decision Systems
The recent years have witnessed the rise of accurate but obscure decision
systems which hide the logic of their internal decision processes to the users.
The lack of explanations for the decisions of black box systems is a key
ethical issue, and a limitation to the adoption of machine learning components
in socially sensitive and safety-critical contexts. %Therefore, we need
explanations that reveals the reasons why a predictor takes a certain decision.
In this paper we focus on the problem of black box outcome explanation, i.e.,
explaining the reasons of the decision taken on a specific instance. We propose
LORE, an agnostic method able to provide interpretable and faithful
explanations. LORE first leans a local interpretable predictor on a synthetic
neighborhood generated by a genetic algorithm. Then it derives from the logic
of the local interpretable predictor a meaningful explanation consisting of: a
decision rule, which explains the reasons of the decision; and a set of
counterfactual rules, suggesting the changes in the instance's features that
lead to a different outcome. Wide experiments show that LORE outperforms
existing methods and baselines both in the quality of explanations and in the
accuracy in mimicking the black box
POTs: Protective Optimization Technologies
Algorithmic fairness aims to address the economic, moral, social, and
political impact that digital systems have on populations through solutions
that can be applied by service providers. Fairness frameworks do so, in part,
by mapping these problems to a narrow definition and assuming the service
providers can be trusted to deploy countermeasures. Not surprisingly, these
decisions limit fairness frameworks' ability to capture a variety of harms
caused by systems.
We characterize fairness limitations using concepts from requirements
engineering and from social sciences. We show that the focus on algorithms'
inputs and outputs misses harms that arise from systems interacting with the
world; that the focus on bias and discrimination omits broader harms on
populations and their environments; and that relying on service providers
excludes scenarios where they are not cooperative or intentionally adversarial.
We propose Protective Optimization Technologies (POTs). POTs provide means
for affected parties to address the negative impacts of systems in the
environment, expanding avenues for political contestation. POTs intervene from
outside the system, do not require service providers to cooperate, and can
serve to correct, shift, or expose harms that systems impose on populations and
their environments. We illustrate the potential and limitations of POTs in two
case studies: countering road congestion caused by traffic-beating
applications, and recalibrating credit scoring for loan applicants.Comment: Appears in Conference on Fairness, Accountability, and Transparency
(FAT* 2020). Bogdan Kulynych and Rebekah Overdorf contributed equally to this
work. Version v1/v2 by Seda G\"urses, Rebekah Overdorf, and Ero Balsa was
presented at HotPETS 2018 and at PiMLAI 201
The role of earth observation in an integrated deprived area mapping “system” for low-to-middle income countries
Urbanization in the global South has been accompanied by the proliferation of vast informal and marginalized urban areas that lack access to essential services and infrastructure. UN-Habitat estimates that close to a billion people currently live in these deprived and informal urban settlements, generally grouped under the term of urban slums. Two major knowledge gaps undermine the efforts to monitor progress towards the corresponding sustainable development goal (i.e., SDG 11—Sustainable Cities and Communities). First, the data available for cities worldwide is patchy and insufficient to differentiate between the diversity of urban areas with respect to their access to essential services and their specific infrastructure needs. Second, existing approaches used to map deprived areas (i.e., aggregated household data, Earth observation (EO), and community-driven data collection) are mostly siloed, and, individually, they often lack transferability and scalability and fail to include the opinions of different interest groups. In particular, EO-based-deprived area mapping approaches are mostly top-down, with very little attention given to ground information and interaction with urban communities and stakeholders. Existing top-down methods should be complemented with bottom-up approaches to produce routinely updated, accurate, and timely deprived area maps. In this review, we first assess the strengths and limitations of existing deprived area mapping methods. We then propose an Integrated Deprived Area Mapping System (IDeAMapS) framework that leverages the strengths of EO- and community-based approaches. The proposed framework offers a way forward to map deprived areas globally, routinely, and with maximum accuracy to support SDG 11 monitoring and the needs of different interest groups
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Projections for fast protein structure retrieval
BACKGROUND: In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. RESULTS: Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali
Recommended from our members
CLOI-NET: Class segmentation of industrial facilities' point cloud datasets
Shape segmentation from point cloud data is a core step of the digital twinning process for industrial facilities. However, it is also a very labor intensive step, which counteracts the perceived value of the resulting model. The state-of-the-art method for automating cylinder detection can detect cylinders with 62% precision and 70% recall, while other shapes must then be segmented manually and shape segmentation is not achieved. This performance is promising, but it is far from drastically eliminating the manual labor cost. We argue that the use of class segmentation deep learning algorithms has the theoretical potential to perform better in terms of per point accuracy and less manual segmentation time needed. However, such algorithms could not be used so far due to the lack of a pre-trained dataset of laser scanned industrial shapes as well as the lack of appropriate geometric features in order to learn these shapes. In this paper, we tackle both problems in three steps. First, we parse the industrial point cloud through a novel class segmentation solution (CLOI-NET) that consists of an optimized PointNET++ based deep learning network and post-processing algorithms that enforce stronger contextual relationships per point. We then allow the user to choose the optimal manual annotation of a test facility by means of active learning to further improve the results. We achieve the first step by clustering points in meaningful spatial 3D windows based on their location. Then, we apply a class segmentation deep network, and output a probability distribution of all label categories per point and improve the predicted labels by enforcing post-processing rules. We finally optimize the results by finding the optimal amount of data to be used for training experiments. We validate our method on the largest richly annotated dataset of the most important to model industrial shapes (CLOI) and yield 82% average accuracy per point, 95.6% average AUC among all classes and estimated 70% labor hour savings in class segmentation. This proves that it is the first to automatically segment industrial point cloud shapes with no prior knowledge at commercially viable performance and is the foundation for efficient industrial shape modeling in cluttered point clouds
- …