254 research outputs found
Simple system to measure the Earth's magnetic field
Our aim in this proposal is by using the Faraday's law of induction as a
simple lecture demonstration to measure the Earth's magnetic field (B). This
will also enable the students to learn about how electric power is generated
from the rotational motion. Obviously the idea is not original, yet it may be
attractive in the sense that no sophisticated devices are used
Efficiently Clustering Very Large Attributed Graphs
Attributed graphs model real networks by enriching their nodes with
attributes accounting for properties. Several techniques have been proposed for
partitioning these graphs into clusters that are homogeneous with respect to
both semantic attributes and to the structure of the graph. However, time and
space complexities of state of the art algorithms limit their scalability to
medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a
fast and scalable algorithm for partitioning large attributed graphs. The
approach is robust, being compatible both with categorical and with
quantitative attributes, and it is tailorable, allowing the user to weight the
semantic and topological components. Further, the approach does not require the
user to guess in advance the number of clusters. SToC relies on well known
approximation techniques such as bottom-k sketches, traditional graph-theoretic
concepts, and a new perspective on the composition of heterogeneous distance
measures. Experimental results demonstrate its ability to efficiently compute
high-quality partitions of large scale attributed graphs.Comment: This work has been published in ASONAM 2017. This version includes an
appendix with validation of our attribute model and distance function,
omitted in the converence version for lack of space. Please refer to the
published versio
On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection
Humans are the final decision makers in critical tasks that involve ethical
and legal concerns, ranging from recidivism prediction, to medical diagnosis,
to fighting against fake news. Although machine learning models can sometimes
achieve impressive performance in these tasks, these tasks are not amenable to
full automation. To realize the potential of machine learning for improving
human decisions, it is important to understand how assistance from machine
learning models affects human performance and human agency.
In this paper, we use deception detection as a testbed and investigate how we
can harness explanations and predictions of machine learning models to improve
human performance while retaining human agency. We propose a spectrum between
full human agency and full automation, and develop varying levels of machine
assistance along the spectrum that gradually increase the influence of machine
predictions. We find that without showing predicted labels, explanations alone
slightly improve human performance in the end task. In comparison, human
performance is greatly improved by showing predicted labels (>20% relative
improvement) and can be further improved by explicitly suggesting strong
machine performance. Interestingly, when predicted labels are shown,
explanations of machine predictions induce a similar level of accuracy as an
explicit statement of strong machine performance. Our results demonstrate a
tradeoff between human performance and human agency and show that explanations
of machine predictions can moderate this tradeoff.Comment: 17 pages, 19 figures, in Proceedings of ACM FAT* 2019, dataset & demo
available at https://deception.machineintheloop.co
Outlier Edge Detection Using Random Graph Generation Models and Applications
Outliers are samples that are generated by different mechanisms from other
normal data samples. Graphs, in particular social network graphs, may contain
nodes and edges that are made by scammers, malicious programs or mistakenly by
normal users. Detecting outlier nodes and edges is important for data mining
and graph analytics. However, previous research in the field has merely focused
on detecting outlier nodes. In this article, we study the properties of edges
and propose outlier edge detection algorithms using two random graph generation
models. We found that the edge-ego-network, which can be defined as the induced
graph that contains two end nodes of an edge, their neighboring nodes and the
edges that link these nodes, contains critical information to detect outlier
edges. We evaluated the proposed algorithms by injecting outlier edges into
some real-world graph data. Experiment results show that the proposed
algorithms can effectively detect outlier edges. In particular, the algorithm
based on the Preferential Attachment Random Graph Generation model consistently
gives good performance regardless of the test graph data. Further more, the
proposed algorithms are not limited in the area of outlier edge detection. We
demonstrate three different applications that benefit from the proposed
algorithms: 1) a preprocessing tool that improves the performance of graph
clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel
noisy data clustering algorithm. These applications show the great potential of
the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape
Seismic risk in the city of Al Hoceima (north of Morocco) using the vulnerability index method, applied in Risk-UE project
The final publication is available at Springer via http://dx.doi.org/10.1007/s11069-016-2566-8Al Hoceima is one of the most seismic active regions in north of Morocco. It is demonstrated by the large seismic episodes reported in seismic catalogs and research studies. However, seismic risk is relatively high due to vulnerable buildings that are either old or don’t respect seismic standards. Our aim is to present a study about seismic risk and seismic scenarios for the city of Al Hoceima. The seismic vulnerability of the existing residential buildings was evaluated using the vulnerability index method (Risk-UE). It was chosen to be adapted and applied to the Moroccan constructions for its practicality and simple methodology. A visual inspection of 1102 buildings was carried out to assess the vulnerability factors. As for seismic hazard, it was evaluated in terms of macroseismic intensity for two scenarios (a deterministic and probabilistic scenario). The maps of seismic risk are represented by direct damage on buildings, damage to population and economic cost. According to the results, the main vulnerability index of the city is equal to 0.49 and the seismic risk is estimated as Slight (main damage grade equal to 0.9 for the deterministic scenario and 0.7 for the probabilistic scenario). However, Moderate to heavy damage is expected in areas located in the newer extensions, in both the east and west of the city. Important economic losses and damage to the population are expected in these areas as well. The maps elaborated can be a potential guide to the decision making in the field of seismic risk prevention and mitigation strategies in Al Hoceima.Peer ReviewedPostprint (author's final draft
On defining rules for cancer data fabrication
Funding: This research is partially funded by the Data Lab, and the EU H2020 project Serums: Securing Medical Data in Smart Patient-Centric Healthcare Systems (grant 826278).Data is essential for machine learning projects, and data accuracy is crucial for being able to trust the results obtained from the associated machine learning models. Previously, we have developed machine learning models for predicting the treatment outcome for breast cancer patients that have undergone chemotherapy, and developed a monitoring system for their treatment timeline showing interactively the options and associated predictions. Available cancer datasets, such as the one used earlier, are often too small to obtain significant results, and make it difficult to explore ways to improve the predictive capability of the models further. In this paper, we explore an alternative to enhance our datasets through synthetic data generation. From our original dataset, we extract rules to generate fabricated data that capture the different characteristics inherent in the dataset. Additional rules can be used to capture general medical knowledge. We show how to formulate rules for our cancer treatment data, and use the IBM solver to obtain a corresponding synthetic dataset. We discuss challenges for future work.Postprin
An exposure-effect approach for evaluating ecosystem-wide risks from human activities
Ecosystem-based management (EBM) is promoted as the solution for sustainable use. An ecosystem-wide assessment methodology is therefore required. In this paper, we present an approach to assess the risk to ecosystem components from human activities common to marine and coastal ecosystems. We build on: (i) a linkage framework that describes how human activities can impact the ecosystem through pressures, and (ii) a qualitative expert judgement assessment of impact chains describing the exposure and sensitivity of ecological components to those activities. Using case study examples applied at European regional sea scale, we evaluate the risk of an adverse ecological impact from current human activities to a suite of ecological components and, once impacted, the time required for recovery to pre-impact conditions should those activities subside. Grouping impact chains by sectors, pressure type, or ecological components enabled impact risks and recovery times to be identified, supporting resource managers in their efforts to prioritize threats for management, identify most at-risk components, and generate time frames for ecosystem recovery
A combined ULBP2 and SEMA5A expression signature as a prognostic and predictive biomarker for colon cancer
Background: Prognostic biomarkers for cancer have the power to change the course of disease if they add value beyond known prognostic factors, if they can help shape treatment protocols, and if they are reliable. The aim of this study was to identify such biomarkers for colon cancer and to understand the molecular mechanisms leading to prognostic stratifications based on these biomarkers. Methods and Findings: We used an in house R based script (SSAT) for the in silico discovery of stage-independent prognostic biomarkers using two cohorts, GSE17536 and GSE17537, that include 177 and 55 colon cancer patients, respectively. This identified 2 genes, ULBP2 and SEMA5A, which when used jointly, could distinguish patients with distinct prognosis. We validated our findings using a third cohort of 48 patients ex vivo. We find that in all cohorts, a combined ULBP2/SEMA5A classification (SU-GIB) can stratify distinct prognostic sub-groups with hazard ratios that range from 2.4 to 4.5 (p=0.01) when overall- or cancer-specific survival is used as an end-measure, independent of confounding prognostic parameters. In addition, our preliminary analyses suggest SU-GIB is comparable to Oncotype DX colon(®) in predicting recurrence in two different cohorts (HR: 1.5-2; p=0.02). SU-GIB has potential as a companion diagnostic for several drugs including the PI3K/mTOR inhibitor BEZ235, which are suitable for the treatment of patients within the bad prognosis group. We show that tumors from patients with worse prognosis have low EGFR autophosphorylation rates, but high caspase 7 activity, and show upregulation of pro-inflammatory cytokines that relate to a relatively mesenchymal phenotype. Conclusions: We describe two novel genes that can be used to prognosticate colon cancer and suggest approaches by which such tumors can be treated. We also describe molecular characteristics of tumors stratified by the SU-GIB signature. © Ivyspring International Publisher
- …