343 research outputs found

    IST Austria Thesis

    Get PDF
    Deep learning is best known for its empirical success across a wide range of applications spanning computer vision, natural language processing and speech. Of equal significance, though perhaps less known, are its ramifications for learning theory: deep networks have been observed to perform surprisingly well in the high-capacity regime, aka the overfitting or underspecified regime. Classically, this regime on the far right of the bias-variance curve is associated with poor generalisation; however, recent experiments with deep networks challenge this view. This thesis is devoted to investigating various aspects of underspecification in deep learning. First, we argue that deep learning models are underspecified on two levels: a) any given training dataset can be fit by many different functions, and b) any given function can be expressed by many different parameter configurations. We refer to the second kind of underspecification as parameterisation redundancy and we precisely characterise its extent. Second, we characterise the implicit criteria (the inductive bias) that guide learning in the underspecified regime. Specifically, we consider a nonlinear but tractable classification setting, and show that given the choice, neural networks learn classifiers with a large margin. Third, we consider learning scenarios where the inductive bias is not by itself sufficient to deal with underspecification. We then study different ways of ‘tightening the specification’: i) In the setting of representation learning with variational autoencoders, we propose a hand- crafted regulariser based on mutual information. ii) In the setting of binary classification, we consider soft-label (real-valued) supervision. We derive a generalisation bound for linear networks supervised in this way and verify that soft labels facilitate fast learning. Finally, we explore an application of soft-label supervision to the training of multi-exit models

    The impact of new drug launches on the loss of labor from disease and injury: evidence from German panel data

    Full text link
    This paper studies the evolution of early retirement due to disease and injury in the German labor force between 1988 and 2004. Using data from the German Federation of Public Pension Providers, the IMS Health Drug Launches database and the WHO Mortality Database, we show that new drug launches have substantially helped to reduce the loss of labor at the disease-level over time. We employ a variety of econometric methods to exploit the pseudo-panel structure of our dataset and find that in Western Germany alone each new chemical entity has on average saved around 200 working years in every year of the observation period. Controlling for individual determinants of health-related retirement, such as worker's age, sex and type of work, we also find evidence that the 2001 reform of pension laws has led to further reductions in the loss of labor from disease and injury

    DIFFERENCES IN SLEEP PATTERNS AMONG HEALTHY SLEEPERS AND PATIENTS AFTER STROKE

    Get PDF
    Sleep deprivation, whether from disorder or lifestyle, whether acute or chronic, poses a significant risk in daytime cognitive performance, excessive somnolence, impaired attention or decreased level of motor abilities. Ischemic stroke resulting in cerebral lesions is a well-known acute disorder that leaves affected patients strongly vulnerable to sleep disturbances that often lead to the above-mentioned cognitive and attentional impairments. In this paper, we analyzed and compared sleep patterns of healthy sleepers and patients after stroke. To overcome the well-known limits of the standardized sleep scoring into several discrete sleep stages we employed the recently proposed probabilistic sleepmodel that represents the sleep process as a continuum in terms of a set of probability curves. The probability curves were considered to represent a form of functional data, and microstructure along with time dynamics of the curves were studied using functional principal components analysis and clustering. Although our study represents a preliminary attempt to separate the two groups of subjects, we were able to identify several physiologically separate sleep patterns and we also identified sleep microstate patterns being a potential source allowing the discrimination of healthy subjects and stroke patients

    Towards understanding knowledge distillation

    Get PDF
    Knowledge distillation, i.e. one classifier being trained on the outputs of another classifier, is an empirically very successful technique for knowledge transfer between classifiers. It has even been observed that classifiers learn much faster and more reliably if trained with the outputs of another classifier as soft labels, instead of from ground truth data. So far, however, there is no satisfactory theoretical explanation of this phenomenon. In this work, we provide the first insights into the working mechanisms of distillation by studying the special case of linear and deep linear classifiers. Specifically, we prove a generalization bound that establishes fast convergence of the expected risk of a distillation-trained linear classifier. From the bound and its proof we extract three keyfactors that determine the success of distillation: data geometry – geometric properties of the datadistribution, in particular class separation, has an immediate influence on the convergence speed of the risk; optimization bias– gradient descentoptimization finds a very favorable minimum of the distillation objective; and strong monotonicity– the expected risk of the student classifier always decreases when the size of the training set grows

    The inductive bias of ReLU networks on orthogonally separable data

    Get PDF
    We study the inductive bias of two-layer ReLU networks trained by gradient flow. We identify a class of easy-to-learn (`orthogonally separable') datasets, and characterise the solution that ReLU networks trained on such datasets converge to. Irrespective of network width, the solution turns out to be a combination of two max-margin classifiers: one corresponding to the positive data subset and one corresponding to the negative data subset. The proof is based on the recently introduced concept of extremal sectors, for which we prove a number of properties in the context of orthogonal separability. In particular, we prove stationarity of activation patterns from some time onwards, which enables a reduction of the ReLU network to an ensemble of linear subnetworks

    Functional vs. parametric equivalence of ReLU networks

    Get PDF
    We address the following question: How redundant is the parameterisation of ReLU networks? Specifically, we consider transformations of the weight space which leave the function implemented by the network intact. Two such transformations are known for feed-forward architectures: permutation of neurons within a layer, and positive scaling of all incoming weights of a neuron coupled with inverse scaling of its outgoing weights. In this work, we show for architectures with non-increasing widths that permutation and scaling are in fact the only function-preserving weight transformations. For any eligible architecture we give an explicit construction of a neural network such that any other network that implements the same function can be obtained from the original one by the application of permutations and rescaling. The proof relies on a geometric understanding of boundaries between linear regions of ReLU networks, and we hope the developed mathematical tools are of independent interest

    Distillation-based training for multi-exit architectures

    Get PDF
    Multi-exit architectures, in which a stack of processing layers is interleaved with early output layers, allow the processing of a test example to stop early and thus save computation time and/or energy. In this work, we propose a new training procedure for multi-exit architectures based on the principle of knowledge distillation. The method encourage searly exits to mimic later, more accurate exits, by matching their output probabilities. Experiments on CIFAR100 and ImageNet show that distillation-based training significantly improves the accuracy of early exits while maintaining state-of-the-art accuracy for late ones. The method is particularly beneficial when training data is limited and it allows a straightforward extension to semi-supervised learning,i.e. making use of unlabeled data at training time. Moreover, it takes only afew lines to implement and incurs almost no computational overhead at training time, and none at all at test time

    A STUDY OF SUSTAINABLE COLORATION OF LYOCELL FABRICS USING EXTRACTS OF TROPICAL ONION SKINS

    Get PDF
    Lyocell is considered as a new fiber that represents a milestone in the development of environmentally sustainable textiles.  Lyocell is spun from wood pulp cellulose via a green chemical process with NMMO (N-methylmorpholine-N-oxide) solvent. Following the concept of lower the environmental impact of fashion clothing, this study aims to determine the suitable natural dyes recipes with the color extracting from tropical onion skins. Colorants were extracted by dissolving crushed dried onion skins with boiled in water at 100ÂșC for 20-25 minutes. The ratio of extracting and dyeing are 1:25 and 1:30 respectively. The optimal dyeing condition was found out at 80ÂșC, 45 minutes with 75% v/v. In addition, a variety of the most commonly used mordants including Potassium aluminum sulfate, Copper (II) sulphate and Iron (II) sulphate were used for mordanting in order to compare the differently mordanted and unmordanted dyed fabrics via color strength (K/S) and CIE L*a*b* color values. It was found that mordant type had an effect on color strength and the color coordinates of fabric dyed with onion skin, which can supply variety of color choices for the same colorants

    ANTIBACTERIAL FINISHING ON COTTON 100% AND CVC FABRICS WITH TANNIN FROM PIPER BETLE EXTRACT

    Get PDF
    Following the recent trend of using natural ingredients from bio-macromolecules, biomaterials, plant extract in textile chain, this research aims to develop an antibacterial textiles finishing with Tannin extracted from piper betle plant. The extracting processes were carried out with different solvent: distilled water, Ethanol 30%, Ethanol 50%, Ethanol 70% in 60 minutes. Two important types of fabric, including Cotton and CVC (Cotton/Polyester) were padded with piper betle extracts, then dried at 60oC in 5 minutes. The presence of tannin on fabric after treatment was determined by FeCl3 test and FT-IR spectrum. The antibacterial effect of finished fabrics was proved according to ASTM 2149-01 standard. The test was performed with Escherichia Coli ATCC 25922 and Staphylococus aureus AATCC 6538. The final results exhibited good antibacterial activity of 83.02%, 65,33% against the bacteria Ecoli and 93.88 %, 85.14% against the bacteria S.Aureus on cotton and then CVC fabrics

    TEACHER TALK IN EFL SPEAKING LESSONS AND IMPLICATIONS FOR LEARNER INVOLEMENT: A CASE STUDY AT A FOREIGN ENGLISH CENTER IN THE MEKONG DELTA, VIETNAM

    Get PDF
    In English teaching classrooms, teacher talk plays as a communicative tool to help language learners to communicate effectively. Studies (Walsh, 2002; Xiao Yan, 2006; Cullen, 1998) have been carried out to investigate the essential role of teacher talk. This paper focuses on the purpose, frequency, and teacher talking time in speaking English language lessons. This study aimed at investigating features of teacher talk constructing and obstructing learner speaking involvement. The results show that teacher talk plays as an important role in language teaching including giving instruction and classroom management, however, the use of language in both L1 and L2 should be taken into consideration in order to maximize student learning involvement.  Article visualizations
    • 

    corecore