38 research outputs found

    Tradeoffs of Diagonal Fisher Information Matrix Estimators

    Full text link
    The Fisher information matrix characterizes the local geometry in the parameter space of neural networks. It elucidates insightful theories and useful tools to understand and optimize neural networks. Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. We examine two such estimators, whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in regression and classification networks. We navigate trade-offs of both estimators based on analytical and numerical studies. We find that the variance quantities depend on the non-linearity with respect to different parameter groups and should not be neglected when estimating the Fisher information

    Data Preprocessing to Mitigate Bias with Boosted Fair Mollifiers

    Full text link
    In a recent paper, Celis et al. (2020) introduced a new approach to fairness that corrects the data distribution itself. The approach is computationally appealing, but its approximation guarantees with respect to the target distribution can be quite loose as they need to rely on a (typically limited) number of constraints on data-based aggregated statistics; also resulting in a fairness guarantee which can be data dependent. Our paper makes use of a mathematical object recently introduced in privacy -- mollifiers of distributions -- and a popular approach to machine learning -- boosting -- to get an approach in the same lineage as Celis et al. but without the same impediments, including in particular, better guarantees in terms of accuracy and finer guarantees in terms of fairness. The approach involves learning the sufficient statistics of an exponential family. When the training data is tabular, the sufficient statistics can be defined by decision trees whose interpretability can provide clues on the source of (un)fairness. Experiments display the quality of the results for simulated and real-world data

    UNIPoint: Universally Approximating Point Processes Intensities

    Full text link
    Point processes are a useful mathematical tool for describing events over time, and so there are many recent approaches for representing and learning them. One notable open question is how to precisely describe the flexibility of point process models and whether there exists a general model that can represent all point processes. Our work bridges this gap. Focusing on the widely used event intensity function representation of point processes, we provide a proof that a class of learnable functions can universally approximate any valid intensity function. The proof connects the well known Stone-Weierstrass Theorem for function approximation, the uniform density of non-negative continuous functions using a transfer functions, the formulation of the parameters of a piece-wise continuous functions as a dynamic system, and a recurrent neural network implementation for capturing the dynamics. Using these insights, we design and implement UNIPoint, a novel neural point process model, using recurrent neural networks to parameterise sums of basis function upon each event. Evaluations on synthetic and real world datasets show that this simpler representation performs better than Hawkes process variants and more complex neural network-based approaches. We expect this result will provide a practical basis for selecting and tuning models, as well as furthering theoretical work on representational complexity and learnability

    Interval-censored Hawkes processes

    Full text link
    Interval-censored data solely records the aggregated counts of events during specific time intervals - such as the number of patients admitted to the hospital or the volume of vehicles passing traffic loop detectors - and not the exact occurrence time of the events. It is currently not understood how to fit the Hawkes point processes to this kind of data. Its typical loss function (the point process log-likelihood) cannot be computed without exact event times. Furthermore, it does not have the independent increments property to use the Poisson likelihood. This work builds a novel point process, a set of tools, and approximations for fitting Hawkes processes within interval-censored data scenarios. First, we define the Mean Behavior Poisson process (MBPP), a novel Poisson process with a direct parameter correspondence to the popular self-exciting Hawkes process. We fit MBPP in the interval-censored setting using an interval-censored Poisson log-likelihood (IC-LL). We use the parameter equivalence to uncover the parameters of the associated Hawkes process. Second, we introduce two novel exogenous functions to distinguish the exogenous from the endogenous events. We propose the multi-impulse exogenous function - for when the exogenous events are observed as event time - and the latent homogeneous Poisson process exogenous function - for when the exogenous events are presented as interval-censored volumes. Third, we provide several approximation methods to estimate the intensity and compensator function of MBPP when no analytical solution exists. Fourth and finally, we connect the interval-censored loss of MBPP to a broader class of Bregman divergence-based functions. Using the connection, we show that the popularity estimation algorithm Hawkes Intensity Process (HIP) is a particular case of the MBPP. We verify our models through empirical testing on synthetic data and real-world data

    Fair Wrapping for Black-box Predictions

    Full text link
    We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias. Our technique builds on the recent analysis of improper loss functions whose optimization can correct any twist in prediction, unfairness being treated as a twist. In the post-processing, we learn a wrapper function which we define as an α\alpha-tree, which modifies the prediction. We provide two generic boosting algorithms to learn α\alpha-trees. We show that our modification has appealing properties in terms of composition of α\alpha-trees, generalization, interpretability, and KL divergence between modified and original predictions. We exemplify the use of our technique in three fairness notions: conditional value-at-risk, equality of opportunity, and statistical parity; and provide experiments on several readily available datasets.Comment: Published in Advances in Neural Information Processing Systems 35 (NeurIPS 2022

    3D NLTE Lithium abundances for late-type stars in GALAH DR3

    Full text link
    Lithium's susceptibility to burning in stellar interiors makes it an invaluable tracer for delineating the evolutionary pathways of stars, offering insights into the processes governing their development. Observationally, the complex Li production and depletion mechanisms in stars manifest themselves as Li plateaus, and as Li-enhanced and Li-depleted regions of the HR diagram. The Li-dip represents a narrow range in effective temperature close to the main-sequence turn-off, where stars have slightly super-solar masses and strongly depleted Li. To study the modification of Li through stellar evolution, we measure 3D non-local thermodynamic equilibrium (NLTE) Li abundance for 581 149 stars released in GALAH DR3. We describe a novel method that fits the observed spectra using a combination of 3D NLTE Li line profiles with blending metal line strength that are optimized on a star-by-star basis. Furthermore, realistic errors are determined by a Monte Carlo nested sampling algorithm which samples the posterior distribution of the fitted spectral parameters. The method is validated by recovering parameters from a synthetic spectrum and comparing to 26 stars in the Hypatia catalogue. We find 228 613 Li detections, and 352 536 Li upper limits. Our abundance measurements are generally lower than GALAH DR3, with a mean difference of 0.23 dex. For the first time, we trace the evolution of Li-dip stars beyond the main sequence turn-off and up the subgiant branch. This is the first 3D NLTE analysis of Li applied to a large spectroscopic survey, and opens up a new era of precision analysis of abundances for large surveys.Comment: 20 pages, 17 figures, accepted for publication in MNRA
    corecore