203 research outputs found
Eliminating Latent Discrimination: Train Then Mask
How can we control for latent discrimination in predictive models? How can we
provably remove it? Such questions are at the heart of algorithmic fairness and
its impacts on society. In this paper, we define a new operational fairness
criteria, inspired by the well-understood notion of omitted variable-bias in
statistics and econometrics. Our notion of fairness effectively controls for
sensitive features and provides diagnostics for deviations from fair decision
making. We then establish analytical and algorithmic results about the
existence of a fair classifier in the context of supervised learning. Our
results readily imply a simple, but rather counter-intuitive, strategy for
eliminating latent discrimination. In order to prevent other features proxying
for sensitive features, we need to include sensitive features in the training
phase, but exclude them in the test/evaluation phase while controlling for
their effects. We evaluate the performance of our algorithm on several
real-world datasets and show how fairness for these datasets can be improved
with a very small loss in accuracy
Network Alignment: Theory, Algorithms, and Applications
Networks are central in the modeling and analysis of many large-scale human and technical systems, and they have applications in diverse fields such as computer science, biology, social sciences, and economics. Recently, network mining has been an active area of research. In this thesis, we study several related network-mining problems, from three different perspectives: the modeling and theory perspective, the computational perspective, and the application perspective. In the bulk of this thesis, we focus on network alignment, where the data provides two (or more) partial views of the network, and where the node labels are sometimes ambiguous. Network alignment has applications in social-network reconciliation and de-anonymization, protein-network alignment in biology, and computer vision. In the first part of this thesis, we investigate the feasibility of network alignment with a random-graph model. This random-graph model generates two (or several) correlated networks, and lets the two networks to overlap only partially. For a particular alignment, we define a cost function for structural mismatch. We show that the minimization of the proposed cost function (assuming that we have access to infinite computational power), with high probability, results in an alignment that recovers the set of shared nodes between the two networks, and that also recovers the true matching between the shared nodes. The most scalable network-alignment approaches use ideas from percolation theory, where a matched node-couple infects its neighboring couples that are additional potential matches. In the second part of this thesis, we propose a new percolation-based network-alignment algorithm that can match large networks by using only the network structure and a handful of initially pre-matched node-couples called seed set. We characterize a phase transition in matching performance as a function of the seed-set size. In the third part of this thesis, we consider two important application areas of network mining in biology and public health. The first application area is percolation-based network alignment of protein-protein interaction (PPI) networks in biology. The alignment of biological networks has many uses, such as the detection of conserved biological network motifs, the prediction of protein interactions, and the reconstruction of phylogenetic trees. Network alignment can be used to transfer biological knowledge between species. We introduce a new global pairwise-network alignment algorithm for PPI networks, called PROPER. The PROPER algorithm shows higher accuracy and speed compared to other global network-alignment methods. We also extend PROPER to the global multiple-network alignment problem. We introduce a new algorithm, called MPROPER, for matching multiple networks. Finally, we explore IsoRank, one of the first and most referenced global pairwise-network alignment algorithms. Our second application area is the control of epidemic processes. We develop and model strategies for mitigating an epidemic in a large-scale dynamic contact network. More precisely, we study epidemics of infectious diseases by (i) modeling the spread of epidemics on a network by using many pieces of information about the mobility and behavior of a population; and by (ii) designing personalized behavioral recommendations for individuals, in order to mitigate the effect of epidemics on that network
Numerical Modelling of Turbulent Free Surface Flows over Rough and Porous Beds Using the Smoothed Particle Hydrodynamics Method
Understanding turbulent flow structure in open channel flows is an important issue for Civil Engineers who study the transport of water, sediments and contaminants in rivers. In the present study, turbulent flows over rough impermeable and porous beds are studied numerically using the Smoothed Particle Hydrodynamics (SPH) method.
A comprehensive review is carried out on the methods of turbulence modelling and treatment of bed boundary in open channel flows in order to identify the limitations of the existing particle models developed in this area. 2D macroscopic SPH models are developed for simulating turbulent free surface flows over rough impermeable and porous beds under various flow conditions. For the case of impermeable beds, a drag force model is proposed to take the effect of bed roughness into account, while for the case of porous beds, macroscopic governing equations are developed based on the SPH formulation, incorporating the effects of drag and porosity.
To simulate the effect of turbulence on the average flow field, a Macroscopic SPH-mixing-length (MSPH-ML) model is proposed based on the Large Eddy Simulation (LES) concept where the mixing-length approach is applied to estimate the eddy-viscosity rather than employing the standard Smagorinsky model. The difficulty in reproducing steady uniform free surface flow is tackled by introducing novel inflow/outflow techniques for the situations in which the flow quantities are unknown at the inflow and outflow boundaries.
The performance of these models is tested by simulating different engineering problems with an insight developed into turbulence modelling and bed/interface boundary treatment. The accuracy of the models is tested by comparing the predicted quantities such as flow velocity, water surface elevation, and turbulent shear stress with existing experimental data.
The limitations of the models are mainly attributed to the macroscopic representation of the roughness layer and porous bed, difficulty in the determination of the values of the empirical coefficients in the closure terms, and limitations with the use of fine computational resolution. On the other hand, the main strength of the model is describing the complicated processes occuring at the bed using simple and practical computational treatments so that the momentum transfer is estimated accurately. It is shown that if the closure terms in the momentum equation which represent the effect of bed drag and flow turbulence are determined carefully based on the physical conditions of bed and flow, the model is capable of being employed for different civil engineering applications
- …