48 research outputs found

    Cross-Lingual Adaptation using Structural Correspondence Learning

    Cross-lingual adaptation, a special case of domain adaptation, refers to the transfer of classification knowledge between two languages. In this article we describe an extension of Structural Correspondence Learning (SCL), a recently proposed algorithm for domain adaptation, for cross-lingual adaptation. The proposed method uses unlabeled documents from both languages, along with a word translation oracle, to induce cross-lingual feature correspondences. From these correspondences a cross-lingual representation is created that enables the transfer of classification knowledge from the source to the target language. The main advantages of this approach over other approaches are its resource efficiency and task specificity. We conduct experiments in the area of cross-language topic and sentiment classification involving English as source language and German, French, and Japanese as target languages. The results show a significant improvement of the proposed method over a machine translation baseline, reducing the relative error due to cross-lingual adaptation by an average of 30% (topic classification) and 59% (sentiment classification). We further report on empirical analyses that reveal insights into the use of unlabeled data, the sensitivity with respect to important hyperparameters, and the nature of the induced cross-lingual correspondences

    API design for machine learning software: experiences from the scikit-learn project

    Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library

    Scikit-learn: Machine Learning in Python

    International audienceScikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net

    Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach

    The lack of sentiment resources in poor resource languages poses challenges for the sentiment analysis in which machine learning is involved. Cross-lingual and semi-supervised learning approaches have been deployed to represent the most common ways that can overcome this issue. However, performance of the existing methods degrades due to the poor quality of translated resources, data sparseness and more specifically, language divergence. An integrated learning model that uses a semi-supervised and an ensembled model while utilizing the available sentiment resources to tackle language divergence related issues is proposed. Additionally, to reduce the impact of translation errors and handle instance selection problem, we propose a clustering-based bee-colony-sample selection method for the optimal selection of most distinguishing features representing the target data. To evaluate the proposed model, various experiments are conducted employing an English-Arabic cross-lingual data set. Simulations results demonstrate that the proposed model outperforms the baseline approaches in terms of classification performances. Furthermore, the statistical outcomes indicate the advantages of the proposed training data sampling and target-based feature selection to reduce the negative effect of translation errors. These results highlight the fact that the proposed approach achieves a performance that is close to in-language supervised models

    On decomposing a deep neural network into modules

    Deep learning is being incorporated in many modern software systems. Deep learning approaches train a deep neural network (DNN) model using training examples, and then use the DNN model for prediction. While the structure of a DNN model as layers is observable, the model is treated in its entirety as a monolithic component. To change the logic implemented by the model, e.g. to add/remove logic that recognizes inputs belonging to a certain class, or to replace the logic with an alternative, the training examples need to be changed and the DNN needs to be retrained using the new set of examples. We argue that decomposing a DNN into DNN modules— akin to decomposing a monolithic software code into modules—can bring the benefits of modularity to deep learning. In this work, we develop a methodology for decomposing DNNs for multi-class problems into DNN modules. For four canonical problems, namely MNIST, EMNIST, FMNIST, and KMNIST, we demonstrate that such decomposition enables reuse of DNN modules to create different DNNs, enables replacement of one DNN module in a DNN with another without needing to retrain. The DNN models formed by composing DNN modules are at least as good as traditional monolithic DNNs in terms of test accuracy for our problems

    Forecasting Daily Solar Energy Production Using Robust Regression Techniques

    We describe a novel approach to forecast daily solar energy production based on the output of a numerical weather prediction (NWP) model using non-parametric robust regression techniques. Our approach comprises two steps: First, we use a non-linear interpolation technique, Gaussian Process regression (also known as Kriging in Geostatistics), to interpolate the coarse NWP grid to the location of the solar energy production facilities. Second, we use Gradient Boosted Regression Trees, a non-parametric regression technique, to predict the daily solar energy output based on the interpolated NWP model and additional spatio-temporal features. Experimental evidence suggests that two aspects of our approach are crucial for its effectiveness: a) the ability of Gaussian Process regression to incorporate both input and output uncertainty which we leverage by deriving input uncertainty from an ensemble of 11 NWP models and including convidence intervals alongside the interpolated point estimates and b) the ability of Gradient Boosted Regression Trees to handle outliers in the outputs by using robust loss functions - a property that is very important due to the volatile nature of solar energy output. We evaluated the approach on a dataset of daily solar energy measurements from 98 stations in Oklahoma. The results show a relative improvement of 17.17% and 46.19% over the baselines, Spline Interpolation and Gaussian Mixture Models, resp

    Ernsthaft humorvoll erziehen : die Bedeutung von Humor und Lachen in der Erziehung

    Humor entwickelt sich im Laufe der Kindheit und steht in Beziehung mit emotionalen, kognitiven, sprachlichen und sozialen Prozessen des Kindes. Die Erziehung ist ein ausschlagender Faktor diesbezüglich. Die vorliegende Arbeit untersucht das Humorverhalten der Erziehungsberechtigten im Familienalltag. Dazu wurden die Ergebnisse von 159 Fragebögen herangezogen, welche von Erziehungsberechtigten mit Kindern zwischen zwei und sechs Jahren ausgefüllt wurden. Fragen rund um das Thema Lachen, den Humoreinsatz in der Erziehung und das Humorverhalten von Eltern und Kindern wurden ausgewertet. Grundlage für letzteres war eine Adaption des Humor Styles Questionnaire von Martin et al. (2002). Die Resultate zeigen, dass Humor im Erziehungsalltag bei 85 % eine eher wichtige bis sehr wichtige Rolle spielt. Es wird von den Erziehungsberechtigten vor allem sozialer Humor eingesetzt. Der bewusste Umgang mit Humor beeinflusst das Humorverhalten positiv. So werden vorwiegend adaptive Humorstile eingesetzt, wenn bereits eine Reflexion über Humor stattgefunden hat. Ähnliches gilt für die Häufigkeit des gemeinsamen Lachens. Zusammenhänge ergaben sich beim Humorverhalten der Eltern und der Kinder. Der soziale, selbstaufwertende und selbstabwertende Humor werden von Erziehungsberechtigten und ihren Kindern ähnlich eingesetzt. Das Alter und Geschlecht der Kinder spielte bei den Ergebnissen keine Rolle und auch die Sichtweise von Müttern und Vätern wies auf keine geschlechtsspezifischen Unterschiede des Humorgebrauchs hin.Humor develops throughout childhood and is closely related to emotional, cognitive, linguistic and social processes of a child. Therefore, a childs upbringing is an essential factor. The present paper examines the parents use of humor (humor behavior) in everyday life with their children. For this purpose, the results of 159 questionnaires from parents with children between two and six years were used. Questions about the topic of laughter, the use of humor in upbringing and the humor behavior of parents and children were evaluated. The basis was an adaptation of the Humor Style Questionnaire by Martin et al. (2002). Results show that humor plays an important or very important role for 85 % of the parents. Above all, aliative humor is used. The conscious use of humor positively influences humor behavior. Adaptive humor styles are predominantly used when a reflection on humor has already taken place. The same applies to the frequency of common laughter. Correlations arise in the humor behavior of parents and children. Aliative, self- enhancing and self- defeating humor is similarly used. There were no gender differences observed neither between the children (boys or girls) nor the parents (mothers or fathers).vorgelegt von Julia PrettenhoferZusammenfassungen in Deutsch und EnglischAbweichender Titel laut Übersetzung des Verfassers/der VerfasserinKarl-Franzens-Universität Graz, Masterarbeit, 2019(VLID)361536

    All-In-Verträge nach den Neuerungen durch das ARÄG 2015 sowie das LSD-BG 2016 : bessere Verhinderung von Lohndumping?

    In der österreichischen Arbeitswelt findet man immer häufiger sogenannte All-In-Verträge, die ursprünglich nur mit leitenden Angestellten als eine Art Überstundenpauschale abgeschlossen wurden. Arbeitgeber erhoffen sich damit hauptsächlich eine Erleichterung in der Lohnabrechnung und Kostensicherheit, da mit einem All-In-Gehalt sämtliche Leistungen abgedeckt werden und daher keine weiteren Ansprüche des Arbeitnehmers mehr bestehen. Für die Arbeitnehmer liegt der Vorteil darin, dass sie den vereinbarten Pauschallohn selbst dann erhalten, wenn sie weniger leisten, als durch das All-In-Gehalt gedeckt ist. Dennoch erfahren diese Vereinbarungen häufig Kritik, weshalb sich der Gesetzgeber veranlasst sah, Pauschalvereinbarungen nun gesetzlich zu regeln. Wesentliche Kritikpunkte der All-In-Verträge waren neben der fehlenden Transparenz, der indirekte Druck zur Mehrarbeit und die Nichtabgeltung der über die Vereinbarung hinausgehenden Leistungen am Ende des Durchrechnungszeitraumes. In meiner Arbeit beschäftige ich mich daher, nach den allgemeinen Ausführungen zu den All-In-Vereinbarungen und deren möglichen Entgeltbestandteilen, mit den diesbezüglichen Neuregelungen durch das Arbeitsrechts-Änderungsgesetz 2015, den Problemstellungen sowie den Rechtsfolgen bei einem Verstoß. Im Weiteren befasse ich mich mit dem Lohn- und Sozialdumping-Bekämpfungsgesetz 2016 und versuche zu klären, ob es durch die All-In-Verträge tatsächlich zu einer unzureichenden Abgeltung der Leistungen und somit zu einer Unterentlohnung der Arbeitnehmer kommt und ob dies durch die Neuregelungen des ARÄG 2015 besser verhindert werden kann.The Austrian job market is subjected more and more to so-called All-in contracts, which were originally intended to cover any additional working time of leading employees as part of an overtime package. Employers hope to facilitate for aspects such as payroll accounting and cost effectiveness, because the All-in contracts are supposed to cover all employee requirements and therefore eliminate any further claims. The advantage for the employees is, that they get the agreed All-in salary, even if they work less than what is covered by this salary. However, these agreements are often subject to harsh criticism, which led the legislature to regulate these flat-rate agreements by law. Most of the criticism of the All-In contracts pointed to a lack of transparency, an indirect pressure for overtime and a non-compensation of performed services beyond the agreement at the end of the calculation period. In this thesis, I analyse the general aspects of the All-In contracts and their possible remuneration components. I will further address the relevant new regulations of the labour law amendment act 2015 (ARÄG 2015), their problems, as well as the legal consequences in case of a contract-breach. In addition I will engage with the wage and social dumping control act (LSD-BG) and I will try to clarify, whether the All-In contracts actually result in insufficient compensation for the performed services and if mechanisms of prevention could be better implemented through the ARÄG 2015.von Monika PrettenhoferAbweichender Titel laut Übersetzung des Verfassers/der VerfasserinKarl-Franzens-Universität Graz, Diplomarbeit, 2017(VLID)183515