399,539 research outputs found

    Portfolio of Infrastructure Investments: Analysis of European Infrastructure

    Get PDF
    Infrastructure has received much attention in recent years. Investment in infrastructure is particularly effective and has been recommended to institutional investors for investments such as pension funds because of the characteristics of infrastructure assets. However, robust analytical and empirical analyses in support of these investments are limited, mainly due to scant empirical data. In this work, by collecting relevant data sets on infrastructures, the authors address two objectives. First, the authors examine the significance of listed infrastructure sectors and subsectors by assessing the investment characteristics and performance of different infrastructure indexes in Europe. The aim here is to demonstrate how an effective and successful infrastructure portfolio should be constructed. The second objective of this research is to evaluate the strategy of infrastructure investors, in other words, to prove evidentially whether the investor should invest in a portfolio containing different infrastructure sectors or whether it is still possible to obtain diversification benefits by investing in only a single infrastructure sector

    A review of statistical techniques in the analysis of linguistic data

    Get PDF
    Chapter OneZadanie pt. „Digitalizacja i udostępnienie w Cyfrowym Repozytorium Uniwersytetu Łódzkiego kolekcji czasopism naukowych wydawanych przez Uniwersytet Łódzki” nr 885/P-DUN/2014 zostało dofinansowane ze środków MNiSW w ramach działalności upowszechniającej nauk

    Mutual Enrichment in Ranked Lists and the Statistical Assessment of Position Weight Matrix Motifs

    Get PDF
    Statistics in ranked lists is important in analyzing molecular biology measurement data, such as ChIP-seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists. More flexible models such as position weight matrix (PWM) motifs are not addressed in this context. To assess the enrichment of a PWM motif in a ranked list we use a PWM induced second ranking on the same set of elements. Possible orders of one ranked list relative to the other are modeled by permutations. Due to sample space complexity, it is difficult to characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top of two uniformly and independently drawn permutations and demonstrate advantages of this approach using our software implementation, mmHG-Finder, to study PWMs in several datasets.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    On testing the significance of sets of genes

    Full text link
    This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analysis (GSEA) procedure of Subramanian et al. [Proc. Natl. Acad. Sci. USA 102 (2005) 15545--15550]. We study the problem in some generality and propose two potential improvements to GSEA: the maxmean statistic for summarizing gene-sets, and restandardization for more accurate inferences. We discuss a variety of examples and extensions, including the use of gene-set scores for class predictions. We also describe a new R language package GSA that implements our ideas.Comment: Published at http://dx.doi.org/10.1214/07-AOAS101 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Halting indigenous biodiversity decline: ambiguity, equity, and outcomes in RMA assessment of significance

    Get PDF
    In New Zealand, assessment of ‘significance’ is undertaken to give effect to a legal requirement for local authorities to provide for protection of significant sites under the Resource Management Act (1991). The ambiguity of the statute enables different interests to define significance according to their goals: vested interests (developers), local authorities, and non-vested interests in pursuit of protection of environmental public goods may advance different definitions. We examine two sets of criteria used for assessment of significance for biological diversity under the Act. Criteria adapted from the 1980s Protected Natural Areas Programme are inadequate to achieve the maintenance of biological diversity if ranking is used to identify only highest priority sites. Norton and Roper-Lindsay (2004) propose a narrow definition of significance and criteria that identify only a few high-quality sites as significant. Both sets are likely to serve the interests of developers and local authorities, but place the penalty of uncertainty on non-vested interests seeking to maintain biological diversity, and are likely to exacerbate the decline of biological diversity and the loss of landscape-scale processes required for its persistence. When adopting criteria for assessment of significance, we suggest local authorities should consider whose interests are served by different criteria sets, and who will bear the penalty of uncertainty regarding biological diversity outcomes. They should also ask whether significance criteria are adequate, and sufficiently robust to the uncertainty inherent in the assessment of natural values, to halt the decline of indigenous biological diversity

    Multiple Hypothesis Testing in Pattern Discovery

    Get PDF
    The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used with a generic data mining algorithm. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive) in the strong sense. We evaluate the performance of our solution on both real and generated data. The results show that our method controls the FWER while maintaining the power of the test.Comment: 28 page

    The Impact of Crowds on News Engagement: A Reddit Case Study

    Full text link
    Today, users are reading the news through social platforms. These platforms are built to facilitate crowd engagement, but not necessarily disseminate useful news to inform the masses. Hence, the news that is highly engaged with may not be the news that best informs. While predicting news popularity has been well studied, it has not been studied in the context of crowd manipulations. In this paper, we provide some preliminary results to a longer term project on crowd and platform manipulations of news and news popularity. In particular, we choose to study known features for predicting news popularity and how those features may change on reddit.com, a social platform used commonly for news aggregation. Along with this, we explore ways in which users can alter the perception of news through changing the title of an article. We find that news on reddit is predictable using previously studied sentiment and content features and that posts with titles changed by reddit users tend to be more popular than posts with the original article title.Comment: Published at The 2nd International Workshop on News and Public Opinion at ICWSM 201
    corecore