555 research outputs found

    Classification of protein interaction sentences via gaussian processes

    Get PDF
    The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption

    ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•œ ์•„๊ฒฉ์ž์Šค์ผ€์ผ ์‘๋ ฅ ๋ชจ๋ธ๋ง๊ณผ ๋‚œ๋ฅ˜ ์ฑ„๋„ ๋ฐ ํ›„ํ–ฅ ๊ณ„๋‹จ ์œ ๋™์—์˜ ์ ์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ๊ธฐ๊ณ„ํ•ญ๊ณต๊ณตํ•™๋ถ€(๋ฉ€ํ‹ฐ์Šค์ผ€์ผ ๊ธฐ๊ณ„์„ค๊ณ„์ „๊ณต), 2021.8. ์ตœํ•ด์ฒœ.A fully-connected neural network (NN) is used to develop a subgrid-scale model which maps the relation between the subgrid-scale stress and filtered flow variable in a turbulent channel (Part I) and backward-facing-step (Part II) flows. For turbulent channel flow, DNS (direct numerical simulation) database of Reฯ„ = 178 is used to develop an NN-based subgrid-scale (SGS) model, and a priori and a posteriori tests are performed to investigate its prediction performance. In a priori test, an NN-based SGS model with the input of filtered velocity gradient or strain rate tensor at multiple grid points provides high correlation coefficients between the true and predicted SGS stresses. However, this model provides an unstable solution in a posteriori test, as the model produces a non-negligible backscatter which is known to induce numerical instability in large eddy simulation (LES). To ensure a stable LES solution with this model, a special treatment like backscatter clipping is required. On the other hand, an NN-based SGS model with the input of filtered strain rate tensor at a single grid point shows an excellent prediction performance for the mean velocity and Reynolds shear stress in a posteriori test, although it gives low correlation coefficients between the true and predicted SGS stresses in a priori test. This NN-based SGS model trained at Reฯ„ = 178 is applied to a turbulent channel flow at Reฯ„ = 723 using the same grid resolution in wall units, providing fairly good agreements of the solutions with the filtered DNS data. When the grid resolution in wall units is different from that of trained data, this NN-based SGS model does not perform well. This is overcome by training an NN with the datasets having two filters whose sizes are larger and smaller than the grid size in large eddy simulation. For turbulent flow over a backward-facing step (BFS), an NN-based SGS model is developed with the filtered DNS data at Reh = 5100. Two input variables, the filtered strain rate and velocity gradient tensors at a single grid point, respectively, are adopted, where the NN-based SGS models with these inputs provide a stable LES solution in the turbulent channel flow without any special treatment. In the LES at Reh = 5100, those NN-based SGS models show similar performance, and provide good predictions for the reattachment length and root-mean-square velocity fluctuations. Then, we assess the performance of the NN-based SGS model with the input of filtered strain rate tensor for the LES at Reh = 24000, and this model provides fairly good results, compared to those from the LES with dynamic Smagorinsky model (DSM). Finally, we apply this model for LES of controlled BFS flow with multiple taps installed at the step edge. LES with this NN-based SGS model predicts the amount of reduction in the reattachment length better than by LES with DSM, showing that the NN-based model trained with uncontrolled BFS flow maintains its prediction performance in LES of controlled BFS flow.๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋‚œ๋ฅ˜ ์ฑ„๋„ ์œ ๋™๊ณผ ํ›„ํ–ฅ ๊ณ„๋‹จ ์ฃผ์œ„ ๋‚œ๋ฅ˜ ์œ ๋™์— ๋Œ€ํ•ด, ํ•„ํ„ฐ๋ง ๋œ ์œ ๋™๋ณ€์ˆ˜๋ฅผ ์ž…๋ ฅ๋ณ€์ˆ˜๋กœ ํ•˜์—ฌ ์•„๊ฒฉ์ž์Šค์ผ€์ผ (subgrid scale, SGS) ์‘๋ ฅ์„ ์˜ˆ์ธกํ•˜๋Š” ์ธ๊ณต์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ SGS ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ๋‚œ๋ฅ˜ ์ฑ„๋„ ์œ ๋™์˜ ๊ฒฝ์šฐ, ํ•„ํ„ฐ๋ง ๋œ ์ง์ ‘์ˆ˜์น˜๋ชจ์‚ฌ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค(Reฯ„ = 178)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๊ณต์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ SGS ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜๊ณ , ๋ณธ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์ „ ๋ฐ ์‚ฌํ›„ ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์‚ฌ์ „ ํ…Œ์ŠคํŠธ์—์„œ, ์—ฌ๋Ÿฌ ๊ฒฉ์ž์ ์— ์žˆ๋Š” ํ•„ํ„ฐ๋ง ๋œ ์†๋„๊ธฐ์šธ๊ธฐ ๋˜๋Š” ์†๋„๋ณ€ํ˜•๋ฅ  ํ…์„œ๋ฅผ ์ž…๋ ฅ๋ณ€์ˆ˜๋กœ ํ•˜๋Š” ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์€ ์‹ค์ œ SGS ์‘๋ ฅ๊ณผ ๋†’์€ ์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ๋ณด์ด๋Š” SGS ์‘๋ ฅ์„ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๋ชจ๋ธ์€ ์‚ฌํ›„ ํ…Œ์ŠคํŠธ์—์„œ ๋ถˆ์•ˆ์ •ํ•œ ์ˆ˜์น˜๊ณ„์‚ฐ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ๊ณ , LES ์†”๋ฃจ์…˜์„ ์–ป๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ›„๋ฐฉ ์‚ฐ๋ž€(backscatter)์„ ๊ฐ•์ œ์ ์œผ๋กœ ์—†์• ๋Š” ๋“ฑ์˜ ์ž„์˜์ ์ธ ์ฒ˜๋ฆฌ๋ฅผ ํ•„์š”๋กœ ํ–ˆ๋‹ค. ๋ฐ˜๋ฉด, ๋‹จ์ผ ๊ฒฉ์ž์ ์—์„œ์˜ ํ•„ํ„ฐ๋ง ๋œ ์†๋„๋ณ€ํ˜•๋ฅ  ํ…์„œ๋ฅผ ์ž…๋ ฅ๋ณ€์ˆ˜๋กœ ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์€, ์‚ฌ์ „ ํ…Œ์ŠคํŠธ์—์„œ ์‹ค์ œ๊ฐ’๊ณผ ๋‚ฎ์€ ์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ๋ณด์ด๋Š” SGS ์‘๋ ฅ์„ ์˜ˆ์ธกํ•˜์˜€์œผ๋‚˜, ์‚ฌํ›„ ํ…Œ์ŠคํŠธ์—์„œ ํ‰๊ท  ์†๋„ ํ”„๋กœํŒŒ์ผ๊ณผ Reynolds ์ „๋‹จ ์‘๋ ฅ์— ๋Œ€ํ•ด ์šฐ์ˆ˜ํ•œ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋” ๋†’์€ Reynolds ์ˆ˜์—์„œ ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, Reฯ„ = 178์—์„œ ํ›ˆ๋ จ๋œ ์ธ๊ณต์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ SGS ๋ชจ๋ธ(์ž…๋ ฅ๋ณ€์ˆ˜: ๋‹จ์ผ ๊ฒฉ์ž์ ์˜ ํ•„ํ„ฐ๋ง ๋œ ์†๋„๋ณ€ํ˜•๋ฅ  ํ…์„œ)์„, Reฯ„ = 723์˜ ํฐ ์—๋”” ๋ชจ์‚ฌ์— ์ ์šฉํ•˜์˜€๋‹ค. ๋ฒฝ ๋‹จ์œ„ ๊ฒฉ์ž ํฌ๊ธฐ๋ฅผ ํ•™์Šต๋ฐ์ดํ„ฐ์˜ ๊ฒƒ๊ณผ ๊ฐ™๋„๋ก ์„ค์ •ํ•œ ๊ฒฝ์šฐ, ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์˜ ํฐ ์—๋”” ๋ชจ์‚ฌ๋Š” ํ•„ํ„ฐ๋ง ๋œ ์ง์ ‘์ˆ˜์น˜๋ชจ์‚ฌ์˜ ์†”๋ฃจ์…˜๊ณผ ์ƒ๋‹นํžˆ ์ž˜ ์ผ์น˜ํ•˜์˜€๋‹ค. ํ•œํŽธ, ๋ฒฝ ๋‹จ์œ„ ๊ฒฉ์ž ํฌ๊ธฐ๊ฐ€ ํ›ˆ๋ จ๋ฐ์ดํ„ฐ์˜ ๊ฒƒ๊ณผ ๋‹ค๋ฅธ ๊ฒฝ์šฐ, ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜์—ˆ์œผ๋‚˜, ํฐ ์—๋”” ๋ชจ์‚ฌ์˜ ๊ฒฉ์ž ํฌ๊ธฐ๋ณด๋‹ค ๊ฒฉ์ž ํฌ๊ธฐ๊ฐ€ ํฐ ๊ทธ๋ฆฌ๊ณ  ์ž‘์€ ํ•™์Šต๋ฐ์ดํ„ฐ๋ฅผ ํ•œ๊บผ๋ฒˆ์— ์ธ๊ณต์‹ ๊ฒฝ๋ง ํ•™์Šต์— ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋ฉด ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ํ›„ํ–ฅ ๊ณ„๋‹จ ์ฃผ์œ„ ๋‚œ๋ฅ˜ ์œ ๋™์˜ ๊ฒฝ์šฐ, Reh = 5100์˜ ํ•„ํ„ฐ๋ง ๋œ ์ง์ ‘์ˆ˜์น˜๋ชจ์‚ฌ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๊ณต์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ SGS ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ์ž…๋ ฅ๋ณ€์ˆ˜๋กœ๋Š”, ๋‚œ๋ฅ˜ ์ฑ„๋„ ์œ ๋™์˜ ํฐ ์—๋”” ๋ชจ์‚ฌ์—์„œ ์•ˆ์ •์ ์ด๊ณ  ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€, ๋‹จ์ผ ๊ฒฉ์ž์ ์˜ ํ•„ํ„ฐ๋ง ๋œ ์†๋„๊ธฐ์šธ๊ธฐ ๊ทธ๋ฆฌ๊ณ  ์†๋„๋ณ€ํ˜•๋ฅ ์„ ๊ฐ๊ฐ ์‹œํ—˜ํ•ด๋ณด์•˜๋‹ค. Reh = 5100์—์„œ ํฐ ์—๋”” ๋ชจ์‚ฌ๋ฅผ ์ˆ˜ํ–‰ํ•œ ๊ฒฐ๊ณผ, ๋‘ ๊ฐœ์˜ ์ž…๋ ฅ๋ณ€์ˆ˜๋กœ ๊ฐ๊ฐ ํ•™์Šต๋œ ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ ๋ชจ๋‘ ์žฌ๋ถ€์ฐฉ ๊ธธ์ด ๋ฐ ๋‚œ๋ฅ˜ ์„ญ๋™๋Ÿ‰์— ๋Œ€ํ•ด์„œ, ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋™์  Smagorinksy ๋ชจ๋ธ(DSM)๊ณผ ๋น„๊ตํ•˜์—ฌ ์ข‹์€ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ์†๋„๋ณ€ํ˜•๋ฅ ์„ ์ž…๋ ฅ๋ณ€์ˆ˜๋กœ ํ•˜๋Š” ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ Reh = 24000์˜ ํฐ ์—๋”” ๋ชจ์‚ฌ์— ์ ์šฉํ•œ ๊ฒฐ๊ณผ, DSM์„ ์‚ฌ์šฉํ•œ ํฐ ์—๋”” ๋ชจ์‚ฌ์™€ ๋น„๊ตํ•˜์—ฌ ์—ฌ์ „ํžˆ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ›„ํ–ฅ ๊ณ„๋‹จ ๋ชจ์„œ๋ฆฌ์— ํƒญ์ด ์„ค์น˜๋œ ์œ ๋™์— ๋Œ€ํ•œ ํฐ ์—๋”” ๋ชจ์‚ฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, SGS ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋Š”, ํ‰๊ท ์†๋„์™€ ๋‚œ๋ฅ˜์„ญ๋™๋Ÿ‰์— ๋Œ€ํ•ด ๊ธฐ์กด ์‹คํ—˜๊ฒฐ๊ณผ ๋ฐ DSM์„ ์‚ฌ์šฉํ•œ LES์™€ ๋งค์šฐ ํฐ ์ฐจ์ด๋ฅผ ๋ณด์˜€์œผ๋‚˜, ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์€ DSM๊ณผ ์œ ์‚ฌํ•œ ์œ ๋™์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋ฉฐ, ์žฌ๋ถ€์ฐฉ ๊ธธ์ด ๊ฐ์†Œ๋Ÿ‰์˜ ๊ฒฝ์šฐ, ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์ด DSM๋ณด๋‹ค ์‹คํ—˜๊ฐ’๊ณผ ๋” ์ž˜ ์ผ์น˜ํ•˜์˜€๋‹ค.Part I Modeling of the subgrid-scale stress with a neural network: application to turbulent channel flow 1 1 Introduction 2 2 Numerical details 9 2.1. Neural-network-based SGS model 9 2.2. Details of DNS and input and output variables 14 3 Results 23 3.1. A priori test 24 3.2. A posteriori test 30 3.3. LES with a grid resolution different from that of trained data 46 4 Conclusions 54 Part II Modeling of the subgrid-scale stress with a neural network: application to turbulent flow over a backward-facing step 56 1 Introduction 57 2 Computational details 62 2.1. Outline of the NN-based SGS model 62 2.2. Details of DNS for training data 64 2.2.1. Computational domain and grid spacing 64 2.2.2. Boundary conditions and numerical methods 65 2.2.3. Filtered DNS flow fields 66 2.3. Training details and hyperparameter optimization 72 3 LES of flow over a backward-facing step at Reh = 5100 79 3.1. Computational details 79 3.2. Results and discussions 81 3.2.1. LES51GR case 81 3.2.2. LES51GC case 90 4 LES of controlled backward-facing-step flow by multiple taps 96 4.1. Computational details 97 4.2. Results 103 5 Concluding remarks 118 References 122 Appendix 133 A Parametric study on the neural-network-based SGS model in turbulent channel flow 133 B Normalization method based on a resolved-scale dissipation toward a universal NN-based SGS model 158 C Computational details for DNS of a forced homogeneous isotropic turbulence 164 D SGS stress from NN model in laminar shear flow 166 Abstract (in Korean) 168๋ฐ•

    Learning design approaches for personalised and non-personalised e-learling systems

    Get PDF
    Recognizing the powerful role that technology plays in the lives of people, researchers are increasingly focusing on the most effective uses of technology to support learning and teaching. Technology enhanced learning (TEL) has the potential to support and transform studentsโ€™ learning and allows them to choose when, where and how to learn. This paper describes two different approaches for the design of personalised and non-personalised online learning environments, which have been developed to investigate whether personalised e-learning is more efficient than non-personalised e-learning, and discuss some of the studentโ€™s experiences and assessment test results based on experiments conducted so far

    Effective Spoken Language Labeling with Deep Recurrent Neural Networks

    Full text link
    Understanding spoken language is a highly complex problem, which can be decomposed into several simpler tasks. In this paper, we focus on Spoken Language Understanding (SLU), the module of spoken dialog systems responsible for extracting a semantic interpretation from the user utterance. The task is treated as a labeling problem. In the past, SLU has been performed with a wide variety of probabilistic models. The rise of neural networks, in the last couple of years, has opened new interesting research directions in this domain. Recurrent Neural Networks (RNNs) in particular are able not only to represent several pieces of information as embeddings but also, thanks to their recurrent architecture, to encode as embeddings relatively long contexts. Such long contexts are in general out of reach for models previously used for SLU. In this paper we propose novel RNNs architectures for SLU which outperform previous ones. Starting from a published idea as base block, we design new deep RNNs achieving state-of-the-art results on two widely used corpora for SLU: ATIS (Air Traveling Information System), in English, and MEDIA (Hotel information and reservation in France), in French.Comment: 8 pages. Rejected from IJCAI 2017, good remarks overall, but slightly off-topic as from global meta-reviews. Recommendations: 8, 6, 6, 4. arXiv admin note: text overlap with arXiv:1706.0174

    An adaptive sampling method for global sensitivity analysis based on least-squares support vector regression

    Get PDF
    In the field of engineering, surrogate models are commonly used for approximating the behavior of a physical phenomenon in order to reduce the computational costs. Generally, a surrogate model is created based on a set of training data, where a typical method for the statistical design is the Latin hypercube sampling (LHS). Even though a space filling distribution of the training data is reached, the sampling process takes no information on the underlying behavior of the physical phenomenon into account and new data cannot be sampled in the same distribution if the approximation quality is not sufficient. Therefore, in this study we present a novel adaptive sampling method based on a specific surrogate model, the least-squares support vector regresson. The adaptive sampling method generates training data based on the uncertainty in local prognosis capabilities of the surrogate model - areas of higher uncertainty require more sample data. The approach offers a cost efficient calculation due to the properties of the least-squares support vector regression. The opportunities of the adaptive sampling method are proven in comparison with the LHS on different analytical examples. Furthermore, the adaptive sampling method is applied to the calculation of global sensitivity values according to Sobol, where it shows faster convergence than the LHS method. With the applications in this paper it is shown that the presented adaptive sampling method improves the estimation of global sensitivity values, hence reducing the overall computational costs visibly

    M-learning in higher education in Bahrain: the educators' view

    Get PDF
    Universities in the oil-rich Gulf Cooperation Countries (GCC) have shown particular interest in m-learning which currently is treated as fashion, but at the same time is considered by corporations and educational institutions to be very promising. This papers investigates the adoption of m-learning at universities in the Kingdom of Bahrain, and explores the educators' views and perceptions of m-learning, as well as its future potential in higher education. A survey questionnaire was distributed to instructors in four universities in the Kingdom of Bahrain, both private and public. This papers presents the pilot study, which includes the results of 45 responses. The findings suggest that although most of the educators understand the concept and they use M-Learning tools to some limited extent, there is a long way until we reach full integration with curriculum and the blended learning approach. In addition, despite the fact that most educators understand the necessity and role of M-Learning in content delivery, they do not seem to embrace at its full potential, as it is mainly used for communication purposes and navigation. The paper proposes that m-learning provides opportunities for more creativity in designing and delivering the course with further enhancement of the student experience, but it will be utilized in its full potential in the area within the next 5 years. This study provides guidance to instructors on the potential of m-learning and the need to change the teaching and learning culture to student-oriented for more effective and appropriate use of m-learning. TI highlights the need for institutions to invest in faculty and staff training, and in technology as well as provides suggestions to other stakeholders on the need to incorporate m-learning in decision-making for further development in the region

    IST Austria Thesis

    Get PDF
    Because of the increasing popularity of machine learning methods, it is becoming important to understand the impact of learned components on automated decision-making systems and to guarantee that their consequences are beneficial to society. In other words, it is necessary to ensure that machine learning is sufficiently trustworthy to be used in real-world applications. This thesis studies two properties of machine learning models that are highly desirable for the sake of reliability: robustness and fairness. In the first part of the thesis we study the robustness of learning algorithms to training data corruption. Previous work has shown that machine learning models are vulnerable to a range of training set issues, varying from label noise through systematic biases to worst-case data manipulations. This is an especially relevant problem from a present perspective, since modern machine learning methods are particularly data hungry and therefore practitioners often have to rely on data collected from various external sources, e.g. from the Internet, from app users or via crowdsourcing. Naturally, such sources vary greatly in the quality and reliability of the data they provide. With these considerations in mind, we study the problem of designing machine learning algorithms that are robust to corruptions in data coming from multiple sources. We show that, in contrast to the case of a single dataset with outliers, successful learning within this model is possible both theoretically and practically, even under worst-case data corruptions. The second part of this thesis deals with fairness-aware machine learning. There are multiple areas where machine learning models have shown promising results, but where careful considerations are required, in order to avoid discrimanative decisions taken by such learned components. Ensuring fairness can be particularly challenging, because real-world training datasets are expected to contain various forms of historical bias that may affect the learning process. In this thesis we show that data corruption can indeed render the problem of achieving fairness impossible, by tightly characterizing the theoretical limits of fair learning under worst-case data manipulations. However, assuming access to clean data, we also show how fairness-aware learning can be made practical in contexts beyond binary classification, in particular in the challenging learning to rank setting
    • โ€ฆ
    corecore