101 research outputs found

    When Moneyball Meets the Beautiful Game: A Predictive Analytics Approach to Exploring Key Drivers for Soccer Player Valuation

    Get PDF
    To measure the market value of a professional soccer (i.e., association football) player is of great interest to soccer clubs. Several gaps emerge from the existing soccer transfer market research. Economics literature only tests the underlying hypotheses between a player’s market value or wage and a few economic factors. Finance literature provides very theoretical pricing frameworks. Sports science literature uncovers numerous pertinent attributes and skills but gives limited insights into valuation practice. The overarching research question of this work is: what are the key drivers of player valuation in the soccer transfer market? To lay the theoretical foundations of player valuation, this work synthesizes the literature in market efficiency and equilibrium conditions, pricing theories and risk premium, and sports science. Predictive analytics is the primary methodology in conjunction with open-source data and exploratory analysis. Several machine learning algorithms are evaluated based on the trade-offs between predictive accuracy and model interpretability. XGBoost, the best model for player valuation, yields the lowest RMSE and the highest adjusted R2. SHAP values identify the most important features in the best model both at a collective level and at an individual level. This work shows a handful of fundamental economic and risk factors have more substantial effect on player valuation than a large number of sports science factors. Within sports science factors, general physiological and psychological attributes appear to be more important than soccer-specific skills. Theoretically, this work proposes a conceptual framework for soccer player valuation that unifies sports business research and sports science research. Empirically, the predictive analytics methodology deepens our understanding of the value drivers of soccer players. Practically, this work enhances transparency and interpretability in the valuation process and could be extended into a player recommender framework for talent scouting. In summary, this work has demonstrated that the application of analytics can improve decision-making efficiency in player acquisition and profitability of soccer clubs

    Three Essays on Trust Mining in Online Social Networks

    Get PDF
    This dissertation research consists of three essays on studying trust in online social networks. Trust plays a critical role in online social relationships, because of the high levels of risk and uncertainty involved. Guided by relevant social science and computational graph theories, I develop conceptual and predictive models to gain insights into trusting behaviors in online social relationships. In the first essay, I propose a conceptual model of trust formation in online social networks. This is the first study that integrates the existing graph-based view of trust formation in social networks with socio-psychological theories of trust to provide a richer understanding of trusting behaviors in online social networks. I introduce new behavioral antecedents of trusting behaviors and redefine and integrate existing graph-based concepts to develop the proposed conceptual model. The empirical findings indicate that both socio-psychological and graph-based trust-related factors should be considered in studying trust formation in online social networks. In the second essay, I propose a theory-based predictive model to predict trust and distrust links in online social networks. Previous trust prediction models used limited network structural data to predict future trust/distrust relationships, ignoring the underlying behavioral trust-inducing factors. I identify a comprehensive set of behavioral and structural predictors of trust/distrust links based on related theories, and then build multiple supervised classification models to predict trust/distrust links in online social networks. The empirical results confirm the superior fit and predictive performance of the proposed model over the baselines. In the third essay, I propose a lexicon-based text mining model to mine trust related user-generated content (UGC). This is the first theory-based text mining model to examine important factors in online trusting decisions from UGC. I build domain-specific trustworthiness lexicons for online social networks based on related behavioral foundations and text mining techniques. Next, I propose a lexicon-based text mining model that automatically extracts and classifies trustworthiness characteristics from trust reviews. The empirical evaluations show the superior performance of the proposed text mining system over the baselines

    Recipe popularity prediction in Finnish social media by machine learning models

    Get PDF
    Abstract. In recent times, the internet has emerged as a primary source of cooking inspiration, eating experiences and food social gathering with a majority of individuals turning to online recipes, surpassing the usage of traditional cookbooks. However, there is a growing concern about the healthiness of online recipes. This thesis focuses on unraveling the determinants of online recipe popularity by analyzing a dataset comprising more than 5000 recipes from Valio, one of Finland’s leading corporations. Valio’s website serves as a representation of diverse cooking preferences among users in Finland. Through examination of recipe attributes such as nutritional content (energy, fat, salt, etc.), food preparation complexity (cooking time, number of steps, required ingredients, etc.), and user engagement (the number of comments, ratings, sentiment of comments, etc.), we aim to pinpoint the critical elements influencing the popularity of online recipes. Our predictive model-Logistic Regression (classification accuracy and F1 score are 0.93 and 0.9 respectively)- substantiates the existence of pertinent recipe characteristics that significantly influence their rates. The dataset we employ is notably influenced by user engagement features, particularly the number of received ratings and comments. In other words, recipes that garner more attention in terms of comments and ratings tend to have higher rates values (i.e., more popular). Additionally, our findings reveal that a substantial portion of Valio’s recipes falls within the medium health Food Standards Agency (FSA) score range, and intriguingly, recipes deemed less healthy tend to receive higher average ratings from users. This study advances our comprehension of the factors contributing to the popularity of online recipes, providing valuable insights into contemporary cooking preferences in Finland as well as guiding future dietary policy shift.Reseptin suosion ennustaminen suomalaisessa sosiaalisessa mediassa koneoppimismalleilla. TiivistelmĂ€. Internet on viime aikoina noussut ensisijaiseksi inspiraation lĂ€hteeksi ruoanlaitossa, ja suurin osa ihmisistĂ€ on siirtynyt kĂ€yttĂ€mÀÀn verkkoreseptejĂ€ perinteisten keittokirjojen sijaan. Huoli verkkoreseptien terveellisyydestĂ€ on kuitenkin kasvava. TĂ€mĂ€ opinnĂ€ytetyö keskittyy verkkoreseptien suosioon vaikuttavien tekijöiden selvittĂ€miseen analysoimalla yli 5000 reseptistĂ€ koostuvaa aineistoa Suomen johtavalta maitotuoteyritykseltĂ€, Valiolta. Valion verkkosivujen reseptit edustavat monipuolisesti suomalaisten kĂ€yttĂ€jien ruoanlaittotottumuksia. Tarkastelemalla reseptin ominaisuuksia, kuten ravintoarvoa (energia, rasva, suola, jne.), valmistuksen monimutkaisuutta (keittoaika, vaiheiden mÀÀrĂ€, tarvittavat ainesosat, jne.) ja kĂ€yttĂ€jien sitoutumista (kommenttien mÀÀrĂ€, arviot, kommenttien mieliala, jne.), pyrimme paikantamaan kriittiset tekijĂ€t, jotka vaikuttavat verkkoreseptien suosioon. Ennustava mallimme — Logistic Regression (luokituksen tarkkuus 0,93 ja F1-pisteet 0,9 ) — osoitti merkitsevien reseptiominaisuuksien olemassaolon. Ne vaikuttivat merkittĂ€vĂ€sti reseptien suosioon. KĂ€yttĂ€miimme tietojoukkoihin vaikuttivat merkittĂ€vĂ€sti kĂ€yttĂ€jien sitoutumisominaisuudet, erityisesti vastaanotettujen arvioiden ja kommenttien mÀÀrĂ€. Toisin sanoen reseptit, jotka saivat enemmĂ€n huomiota kommenteissa ja arvioissa, olivat yleensĂ€ suositumpia. LisĂ€ksi selvisi, ettĂ€ huomattava osa Valion resepteistĂ€ kuuluu keskitason terveyspisteiden alueelle (arvioituna FSA Scorella), ja mielenkiintoisesti, vĂ€hemmĂ€n terveellisiksi katsotut reseptit saavat kĂ€yttĂ€jiltĂ€ yleensĂ€ korkeamman keskiarvon. TĂ€mĂ€ tutkimus edistÀÀ ymmĂ€rrystĂ€mme verkkoreseptien suosioon vaikuttavista tekijöistĂ€ ja tarjoaa arvokasta nĂ€kemystĂ€ nykypĂ€ivĂ€n ruoanlaittotottumuksista Suomessa

    Are English football players overvalued?

    Get PDF
    It is often suggested by fans, the media, and football commentators that English players are over- valued and receive higher salaries than those of comparable players from different countries. This study examines whether the suggestion can be empirically substantiated. Using a unique database covering all of the European elite leagues, we show that there is an English player value premium of around 40% and a wage premium of 25%. Exploring the reasons for this phenomenon in a regression setting, we find that the excess valuation and salary differential can be partly justified by several factors. First, and most importantly, there is a higher value attributed and wages paid to players in the English Premier League (EPL); second, that English players are more likely to play as strikers than in other positions in EPL clubs; third, that their performance in some positions is somewhat better than the average of players from other nations; fourth, that there are fewer of them in the top European leagues, leading to a shortage in supply. There is, however, evidence that the higher valuation of English attackers and the higher salaries of English attackers and of English midfielders evades explanation by any of these groups of variables

    FORETELL: Aggregating Distributed, Heterogeneous Information from Diverse Sources Using Market-based Techniques

    Get PDF
    Predicting the outcome of uncertain events that will happen in the future is a frequently indulged task by humans while making critical decisions. The process underlying this prediction and decision making is called information aggregation, which deals with collating the opinions of different people, over time, about the future event’s possible outcome. The information aggregation problem is non-trivial as the information related to future events is distributed spatially and temporally, the information gets changed dynamically as related events happen, and, finally, people’s opinions about events’ outcomes depends on the information they have access to and the mechanism they use to form opinions from that information. This thesis addresses the problem of distributed information aggregation by building computational models and algorithms for different aspects of information aggregation so that the most likely outcome of future events can be predicted with utmost accuracy. We have employed a commonly used market-based framework called a prediction market to formally analyze the process of information aggregation. The behavior of humans performing information aggregation within a prediction market is implemented using software agents which employ sophisticated algorithms to perform complex calculations on behalf of the humans, to aggregate information efficiently. We have considered five different yet crucial problems related to information aggregation, which include: (i) the effect of variations in the parameters of the information being aggregated, such as its reliability, availability, accessibility, etc., on the predicted outcome of the event, (ii) improving the prediction accuracy by having each human (software-agent) build a more accurate model of other humans’ behavior in the prediction market, (iii) identifying how various market parameters effect its dynamics and accuracy, (iv) applying information aggregation to the domain of distributed sensor information fusion, and, (v) aggregating information on an event while considering dissimilar, but closely-related events in different prediction markets. We have verified all of our proposed techniques through analytical results and experiments while using commercially available data from real prediction markets within a simulated, multi-agent based prediction market. Our results show that our proposed techniques for information aggregation perform more efficiently or comparably with existing techniques for information aggregation using prediction markets

    Efficient Data Driven Multi Source Fusion

    Get PDF
    Data/information fusion is an integral component of many existing and emerging applications; e.g., remote sensing, smart cars, Internet of Things (IoT), and Big Data, to name a few. While fusion aims to achieve better results than what any one individual input can provide, often the challenge is to determine the underlying mathematics for aggregation suitable for an application. In this dissertation, I focus on the following three aspects of aggregation: (i) efficient data-driven learning and optimization, (ii) extensions and new aggregation methods, and (iii) feature and decision level fusion for machine learning with applications to signal and image processing. The Choquet integral (ChI), a powerful nonlinear aggregation operator, is a parametric way (with respect to the fuzzy measure (FM)) to generate a wealth of aggregation operators. The FM has 2N variables and N(2N − 1) constraints for N inputs. As a result, learning the ChI parameters from data quickly becomes impractical for most applications. Herein, I propose a scalable learning procedure (which is linear with respect to training sample size) for the ChI that identifies and optimizes only data-supported variables. As such, the computational complexity of the learning algorithm is proportional to the complexity of the solver used. This method also includes an imputation framework to obtain scalar values for data-unsupported (aka missing) variables and a compression algorithm (lossy or losselss) of the learned variables. I also propose a genetic algorithm (GA) to optimize the ChI for non-convex, multi-modal, and/or analytical objective functions. This algorithm introduces two operators that automatically preserve the constraints; therefore there is no need to explicitly enforce the constraints as is required by traditional GA algorithms. In addition, this algorithm provides an efficient representation of the search space with the minimal set of vertices. Furthermore, I study different strategies for extending the fuzzy integral for missing data and I propose a GOAL programming framework to aggregate inputs from heterogeneous sources for the ChI learning. Last, my work in remote sensing involves visual clustering based band group selection and Lp-norm multiple kernel learning based feature level fusion in hyperspectral image processing to enhance pixel level classification

    Integrating expert-based objectivist and nonexpert-based subjectivist paradigms in landscape assessment

    Get PDF
    This thesis explores the integration of objective and subjective measures of landscape aesthetics, particularly focusing on crowdsourced geo-information. It addresses the increasing importance of considering public perceptions in national landscape governance, in line with the European Landscape Convention's emphasis on public involvement. Despite this, national landscape assessments often remain expert-centric and top-down, facing challenges in resource constraints and limited public engagement. The thesis leverages Web 2.0 technologies and crowdsourced geographic information, examining correlations between expert-based metrics of landscape quality and public perceptions. The Scenic-Or-Not initiative for Great Britain, GIS-based Wildness spatial layers, and LANDMAP dataset for Wales serve as key datasets for analysis. The research investigates the relationships between objective measures of landscape wildness quality and subjective measures of aesthetics. Multiscale geographically weighted regression (MGWR) reveals significant correlations, with different wildness components exhibiting varying degrees of association. The study suggests the feasibility of incorporating wildness and scenicness measures into formal landscape aesthetic assessments. Comparing expert and public perceptions, the research identifies preferences for water-related landforms and variations in upland and lowland typologies. The study emphasizes the agreement between experts and non-experts on extreme scenic perceptions but notes discrepancies in mid-spectrum landscapes. To overcome limitations in systematic landscape evaluations, an integrative approach is proposed. Utilizing XGBoost models, the research predicts spatial patterns of landscape aesthetics across Great Britain, based on the Scenic-Or-Not initiatives, Wildness spatial layers, and LANDMAP data. The models achieve comparable accuracy to traditional statistical models, offering insights for Landscape Character Assessment practices and policy decisions. While acknowledging data limitations and biases in crowdsourcing, the thesis discusses the necessity of an aggregation strategy to manage computational challenges. Methodological considerations include addressing the modifiable areal unit problem (MAUP) associated with aggregating point-based observations. The thesis comprises three studies published or submitted for publication, each contributing to the understanding of the relationship between objective and subjective measures of landscape aesthetics. The concluding chapter discusses the limitations of data and methods, providing a comprehensive overview of the research

    4th. International Conference on Advanced Research Methods and Analytics (CARMA 2022)

    Full text link
    Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information. As these sources, methods, and applications become more interdisciplinary, the 4th International Conference on Advanced Research Methods and Analytics (CARMA) is a forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges. Due to the covid pandemic, CARMA 2022 is planned as a virtual and face-to-face conference, simultaneouslyDoménech I De Soria, J.; Vicente Cuervo, MR. (2022). 4th. International Conference on Advanced Research Methods and Analytics (CARMA 2022). Editorial Universitat PolitÚcnica de ValÚncia. https://doi.org/10.4995/CARMA2022.2022.1595

    Using data mining to repurpose German language corpora. An evaluation of data-driven analysis methods for corpus linguistics

    Get PDF
    A growing number of studies report interesting insights gained from existing data resources. Among those, there are analyses on textual data, giving reason to consider such methods for linguistics as well. However, the field of corpus linguistics usually works with purposefully collected, representative language samples that aim to answer only a limited set of research questions. This thesis aims to shed some light on the potentials of data-driven analysis based on machine learning and predictive modelling for corpus linguistic studies, investigating the possibility to repurpose existing German language corpora for linguistic inquiry by using methodologies developed for data science and computational linguistics. The study focuses on predictive modelling and machine-learning-based data mining and gives a detailed overview and evaluation of currently popular strategies and methods for analysing corpora with computational methods. After the thesis introduces strategies and methods that have already been used on language data, discusses how they can assist corpus linguistic analysis and refers to available toolkits and software as well as to state-of-the-art research and further references, the introduced methodological toolset is applied in two differently shaped corpus studies that utilize readily available corpora for German. The first study explores linguistic correlates of holistic text quality ratings on student essays, while the second deals with age-related language features in computer-mediated communication and interprets age prediction models to answer a set of research questions that are based on previous research in the field. While both studies give linguistic insights that integrate into the current understanding of the investigated phenomena in German language, they systematically test the methodological toolset introduced beforehand, allowing a detailed discussion of added values and remaining challenges of machine-learning-based data mining methods in corpus at the end of the thesis

    Reliable statistical modeling of weakly structured information

    Get PDF
    The statistical analysis of "real-world" data is often confronted with the fact that most standard statistical methods were developed under some kind of idealization of the data that is often not adequate in practical situations. This concerns among others i) the potentially deficient quality of the data that can arise for example due to measurement error, non-response in surveys or data processing errors and ii) the scale quality of the data, that is idealized as "the data have some clear scale of measurement that can be uniquely located within the scale hierarchy of Stevens (or that of Narens and Luce or Orth)". Modern statistical methods like, e.g., correction techniques for measurement error or robust methods cope with issue i). In the context of missing or coarsened data, imputation techniques and methods that explicitly model the missing/coarsening process are nowadays wellestablished tools of refined data analysis. Concerning ii) the typical statistical viewpoint is a more pragmatical one, in case of doubt one simply presumes the strongest scale of measurement that is clearly "justified". In more complex situations, like for example in the context of the analysis of ranking data, statisticians often simply do not worry about purely measurement theoretic reservations too much, but instead embed the data structure in an appropriate, easy to handle space, like e.g. a metric space and then use all statistical tools available for this space. Against this background, the present cumulative dissertation tries to contribute from different perspectives to the appropriate handling of data that challenge the above-mentioned idealizations. A focus here is on the one hand on analysis of interval-valued and set-valued data within the methodology of partial identification, and on the other hand on the analysis of data with values in a partially ordered set (poset-valued data). Further tools of statistical modeling treated in the dissertation are necessity measures in the context of possibility theory and concepts of stochastic dominance for poset-valued data. The present dissertation consists of 8 contributions, which will be detailedly discussed in the following sections: Contribution 1 analyzes different identification regions for partially identified linear models under interval-valued responses and develops a further kind of identification region (as well as a corresponding estimator). Estimates for the identifcation regions are compared to each other and also to classical statistical approaches for a data set on wine quality. Contribution 2 deals with logistic regression under coarsened responses, analyzes point-identifying assumptions and develops likelihood-based estimators for the identified set. The methods are illustrated with data of a wave of the panel study "Labor Market and Social Security" (PASS). Contribution 3 analyzes the combinatorial structure of the extreme points and the edges of a polytope (called credal set or core in the literature) that plays a crucial role in imprecise probability theory. Furthermore, an efficient algorithm for enumerating all extreme points is given and compared to existing standard methods. Contribution 4 develops a quantile concept for data or random variables with values in a complete lattice, which is applied in Contribution 5 to the case of ranking data in the context of a data set on the wisdom of the crowd phenomena. In Contribution 6 a framework for evaluating the quality of different aggregation functions of Social Choice Theory is developed, which enables analysis of quality in dependence of group specific homogeneity. In a simulation study, selected aggregation functions, including an aggregation function based on the concepts of Contribution 4 and Contribution 5, are analyzed. Contribution 7 supplies a linear program that allows for detecting stochastic dominance for poset-valued random variables, gives proposals for inference and regularization, and generalizes the approach to the general task of optimizing a linear function on a closure system. The generality of the developed methods is illustrated with data examples in the context of multivariate inequality analysis, item impact and differential item functioning in the context of item response theory, analyzing distributional differences in spatial statistics and guided regularization in the context of cognitive diagnosis models. Contribution 8 uses concepts of stochastic dominance to establish a descriptive approach for a relational analysis of person ability and item difficulty in the context of multidimensional item response theory. All developed methods have been implemented in the language R ([R Development Core Team, 2014]) and are available from the author upon request. The application examples corroborate the usefulness of weak types of statistical modeling examined in this thesis, which, beyond their flexibility to deal with many kinds of data deficiency, can still lead to informative substance matter conclusions that are then more reliable due to the weak modeling.Die statistische Analyse real erhobener Daten sieht sich oft mit der Tatsache konfrontiert, dass ĂŒbliche statistische Standardmethoden unter einer starken Idealisierung der Datensituation entwickelt wurden, die in der Praxis jedoch oft nicht angemessen ist. Dies betrifft i) die möglicherweise defizitĂ€re QualitĂ€t der Daten, die beispielsweise durch Vorhandensein von Messfehlern, durch systematischen Antwortausfall im Kontext sozialwissenschaftlicher Erhebungen oder auch durch Fehler wĂ€hrend der Datenverarbeitung bedingt ist und ii) die SkalenqualitĂ€t der Daten an sich: Viele Datensituationen lassen sich nicht in die einfachen Skalenhierarchien von Stevens (oder die von Narens und Luce oder Orth) einordnen. Modernere statistische Verfahren wie beispielsweise Messfehlerkorrekturverfahren oder robuste Methoden versuchen, der Idealisierung der DatenqualitĂ€t im Nachhinein Rechnung zu tragen. Im Zusammenhang mit fehlenden bzw. intervallzensierten Daten haben sich Imputationsverfahren zur VervollstĂ€ndigung fehlender Werte bzw. Verfahren, die den Entstehungprozess der vergröberten Daten explizit modellieren, durchgesetzt. In Bezug auf die SkalenqualitĂ€t geht die Statistik meist eher pragmatisch vor, im Zweifelsfall wird das niedrigste Skalenniveau gewĂ€hlt, das klar gerechtfertigt ist. In komplexeren multivariaten Situationen, wie beispielsweise der Analyse von Ranking-Daten, die kaum noch in das Stevensche "Korsett" gezwungen werden können, bedient man sich oft der einfachen Idee der Einbettung der Daten in einen geeigneten metrischen Raum, um dann anschließend alle Werkzeuge metrischer Modellierung nutzen zu können. Vor diesem Hintergrund hat die hier vorgelegte kumulative Dissertation deshalb zum Ziel, aus verschiedenen Blickwinkeln BeitrĂ€ge zum adĂ€quaten Umgang mit Daten, die jene Idealisierungen herausfordern, zu leisten. Dabei steht hier vor allem die Analyse intervallwertiger bzw. mengenwertiger Daten mittels partieller Identifikation auf der Seite defzitĂ€rer DatenqualitĂ€t im Vordergrund, wĂ€hrend bezĂŒglich SkalenqualitĂ€t der Fall von verbandswertigen Daten behandelt wird. Als weitere Werkzeuge statistischer Modellierung werden hier insbesondere Necessity-Maße im Rahmen der Imprecise Probabilities und Konzepte stochastischer Dominanz fĂŒr Zufallsvariablen mit Werten in einer partiell geordneten Menge betrachtet. Die vorliegende Dissertation umfasst 8 BeitrĂ€ge, die in den folgenden Kapiteln nĂ€her diskutiert werden: Beitrag 1 analysiert verschiedene Identifikationsregionen fĂŒr partiell identifizierte lineare Modelle unter intervallwertig beobachteter Responsevariable und schlĂ€gt eine neue Identifikationsregion (inklusive SchĂ€tzer) vor. FĂŒr einen Datensatz, der die QualitĂ€t von verschiedenen Rotweinen, gegeben durch ExpertInnenurteile, in AbhĂ€ngigkeit von verschiedenen physikochemischen Eigenschaften beschreibt, werden SchĂ€tzungen fĂŒr die Identifikationsregionen analysiert. Die Ergebnisse werden ebenfalls mit den Ergebissen klassischer Methoden fĂŒr Intervalldaten verglichen. Beitrag 2 behandelt logistische Regression unter vergröberter Responsevariable, analysiert punktidentifizierende Annahmen und entwickelt likelihoodbasierte SchĂ€tzer fĂŒr die entsprechenden Identifikationsregionen. Die Methode wird mit Daten einer Welle der Panelstudie "Arbeitsmarkt und Soziale Sicherung" (PASS) illustriert. Beitrag 3 analysiert die kombinatorische Struktur der Extrempunkte und der Kanten eines Polytops (sogenannte Struktur bzw. Kern einer Intervallwahrscheinlichkeit bzw. einer nicht-additiven Mengenfunktion), das von wesentlicher Bedeutung in vielen Gebieten der Imprecise Probability Theory ist. Ein effizienter Algorithmus zur Enumeration aller Extrempunkte wird ebenfalls gegeben und mit existierenden Standardenumerationsmethoden verglichen. In Beitrag 4 wird ein Quantilkonzept fĂŒr verbandswertige Daten bzw. Zufallsvariablen vorgestellt. Dieses Quantilkonzept wird in Beitrag 5 auf Ranking-Daten im Zusammenhang mit einem Datensatz, der das "Weisheit der Vielen"-PhĂ€nomen untersucht, angewendet. Beitrag 6 entwickelt eine Methode zur probabilistischen Analyse der "QualitĂ€t" verschiedener Aggregationsfunktionen der Social Choice Theory. Die Analyse wird hier in AbhĂ€angigkeit der HomogenitĂ€t der betrachteten Gruppen durchgefĂŒhrt. In einer simulationsbasierten Studie werden exemplarisch verschiedene klassische Aggregationsfunktionen, sowie eine neue Aggregationsfunktion basierend auf den BeitrĂ€gen 4 und 5, verglichen. Beitrag 7 stellt einen Ansatz vor, um das Vorliegen stochastischer Dominanz zwischen zwei Zufallsvariablen zu ĂŒberprĂŒfen. Der Anstaz nutzt Techniken linearer Programmierung. Weiterhin werden VorschlĂ€ge fĂŒr statistische Inferenz und Regularisierung gemacht. Die Methode wird anschließend auch auf den allgemeineren Fall des Optimierens einer linearen Funktion auf einem HĂŒllensystem ausgeweitet. Die flexible Anwendbarkeit wird durch verschiedene Anwendungsbeispiele illustriert. Beitrag 8 nutzt Ideen stochastischer Dominanz, um DatensĂ€tze der multidimensionalen Item Response Theory relational zu analysieren, indem Paare von sich gegenseitig empirisch stĂŒtzenden FĂ€higkeitsrelationen der Personen und Schwierigkeitsrelationen der Aufgaben entwickelt werden. Alle entwickelten Methoden wurden in R ([R Development Core Team, 2014]) implementiert. Die Anwendungsbeispiele zeigen die FlexibilitĂ€t der hier betrachteten Methoden relationaler bzw. "schwacher" Modellierung insbesondere zur Behandlung defizitĂ€rer Daten und unterstreichen die Tatsache, dass auch mit Methoden schwacher Modellierung oft immer noch nichttriviale substanzwissenschaftliche RĂŒckschlĂŒsse möglich sind, die aufgrund der inhaltlich vorsichtigeren Modellierung dann auch sehr viel stĂ€rker belastbar sind
    • 

    corecore