2,505 research outputs found

    Assessing similarity of feature selection techniques in high-dimensional domains

    Get PDF
    Recent research efforts attempt to combine multiple feature selection techniques instead of using a single one. However, this combination is often made on an “ad hoc” basis, depending on the specific problem at hand, without considering the degree of diversity/similarity of the involved methods. Moreover, though it is recognized that different techniques may return quite dissimilar outputs, especially in high dimensional/small sample size domains, few direct comparisons exist that quantify these differences and their implications on classification performance. This paper aims to provide a contribution in this direction by proposing a general methodology for assessing the similarity between the outputs of different feature selection methods in high dimensional classification problems. Using as benchmark the genomics domain, an empirical study has been conducted to compare some of the most popular feature selection methods, and useful insight has been obtained about their pattern of agreement

    Enhancing random forests performance in microarray data classification

    Get PDF
    Random forests are receiving increasing attention for classification of microarray datasets. We evaluate the effects of a feature selection process on the performance of a random forest classifier as well as on the choice of two critical parameters, i.e. the forest size and the number of features chosen at each split in growing trees. Results of our experiments suggest that parameters lower than popular default values can lead to effective and more parsimonious classification models. Growing few trees on small subsets of selected features, while randomly choosing a single variable at each split, results in classification performance that compares well with state-of-art studies

    Data mining for detecting Bitcoin Ponzi schemes

    Full text link
    Soon after its introduction in 2009, Bitcoin has been adopted by cyber-criminals, which rely on its pseudonymity to implement virtually untraceable scams. One of the typical scams that operate on Bitcoin are the so-called Ponzi schemes. These are fraudulent investments which repay users with the funds invested by new users that join the scheme, and implode when it is no longer possible to find new investments. Despite being illegal in many countries, Ponzi schemes are now proliferating on Bitcoin, and they keep alluring new victims, who are plundered of millions of dollars. We apply data mining techniques to detect Bitcoin addresses related to Ponzi schemes. Our starting point is a dataset of features of real-world Ponzi schemes, that we construct by analysing, on the Bitcoin blockchain, the transactions used to perform the scams. We use this dataset to experiment with various machine learning algorithms, and we assess their effectiveness through standard validation protocols and performance metrics. The best of the classifiers we have experimented can identify most of the Ponzi schemes in the dataset, with a low number of false positives

    Microcosmos d'arquetipus

    Get PDF

    Catalunya a l'aldea global

    Get PDF
    Small societies must also adapt to an increasingly integrated world, econmmically, culturally and politically. Globalization is a polifaceted phenomenon transforming the world into a global village. Three examples of globalisation are the rapidly extended financial crisis, with the threat of global recession; the arrest of Pinochet in London on a Spanish warrant, marking the birth of a global public opinion; and the acceptance of differences as a way to solve conflicts both in Northern Ireland and in the creation of the European Union. Globalisation is forcing societies to adapt their cultural, institutional and political references. The two biggest transformations of the 20th Century have been continued economic growth and the political preeminence of democracy, which are interrelated. Catalan society has made a fundamental contribution to both in Spain. This is no guarantee for the future with globalisation. The complexity of modern society is based on the 17th Century principle of tolerance. Descentralisation promotes the natural process of self-government, but globalisation makes universal problems difficult to solve in a limited territory. The political union of Europe is the alternative. Prosperity is not the result of natural advantages, but of values which favour productivity. These values respect individual freedom. A business culture does not arise from business schools but from these values of democracy and tolerance. The role of the public sector is to guarantee social cohesion, efficient education and adequate infrastructures, including electronic communications

    BioCloud Search EnGene: Surfing Biological Data on the Cloud

    Get PDF
    The massive production and spread of biomedical data around the web introduces new challenges related to identify computational approaches for providing quality search and browsing of web resources. This papers presents BioCloud Search EnGene (BSE), a cloud application that facilitates searching and integration of the many layers of biological information offered by public large-scale genomic repositories. Grounding on the concept of dataspace, BSE is built on top of a cloud platform that severely curtails issues associated with scalability and performance. Like popular online gene portals, BSE adopts a gene-centric approach: researchers can find their information of interest by means of a simple “Google-like” query interface that accepts standard gene identification as keywords. We present BSE architecture and functionality and discuss how our strategies contribute to successfully tackle big data problems in querying gene-based web resources. BSE is publically available at: http://biocloud-unica.appspot.com/

    Catalunya a l'aldea global

    Get PDF
    Small societies must also adapt to an increasingly integrated world, econmmically, culturally and politically. Globalization is a polifaceted phenomenon transforming the world into a global village. Three examples of globalisation are the rapidly extended financial crisis, with the threat of global recession; the arrest of Pinochet in London on a Spanish warrant, marking the birth of a global public opinion; and the acceptance of differences as a way to solve conflicts both in Northern Ireland and in the creation of the European Union. Globalisation is forcing societies to adapt their cultural, institutional and political references. The two biggest transformations of the 20th Century have been continued economic growth and the political preeminence of democracy, which are interrelated. Catalan society has made a fundamental contribution to both in Spain. This is no guarantee for the future with globalisation. The complexity of modern society is based on the 17th Century principle of tolerance. Descentralisation promotes the natural process of self-government, but globalisation makes universal problems difficult to solve in a limited territory. The political union of Europe is the alternative. Prosperity is not the result of natural advantages, but of values which favour productivity. These values respect individual freedom. A business culture does not arise from business schools but from these values of democracy and tolerance. The role of the public sector is to guarantee social cohesion, efficient education and adequate infrastructures, including electronic communications

    Fostering innovation in library management and leadership: The University of Hong Kong libraries leadership institute

    Get PDF
    Purpose - The purpose of this paper is to discuss experiences gained from the introduction of a library leadership institute for Asian academic librarians. Design/methodology/approach - The success of the institute is measured through the evaluations of all participants including, most recently, an attempt to identify challenges faced by academic library leaders, and potential leaders, and assessing how well the institute addresses those challenges. Findings - While evaluations of the institute are highly positive, there appears to be potential for expanding the institute into two streams, one being strictly leadership and the other drawing mainly on management issues. Research limitations/implications - While analysis of institute evaluations and comments demonstrates a great deal of satisfaction, further research should be undertaken to identify long-term benefits gained by participants. Practical implications - The volatile world of information places many challenges on library leaders in the Asia region. The need for strong leadership is apparent as librarians must draw on a range of skills that are not traditionally taught in library schools and are often difficult to develop in the workplace. The benefits of leadership institutes, while limited, do at least plant a seed for new ideas and ways of thinking. Originality/value - The paper provides a through analysis of the only Asian academic library leadership institute. It is useful for others considering establishing a similar institute or for those concerned with library professional development in Asia.postprin

    A comparative analysis of biomarker selection techniques

    Get PDF
    Feature selection has become the essential step in biomarker discovery from high-dimensional genomics data. It is recognized that different feature selection techniques may result in different set of biomarkers, i.e. different groups of genes highly correlated to a given pathological condition, but few direct comparisons exist that quantify these differences in a systematic way. In this paper, we propose a general methodology for comparing the outcomes of different selection techniques in the context of biomarker discovery. The comparison is carried out along two dimensions: (i) measuring the similarity/dissimilarity of selected gene sets, (ii) evaluating the implications of these differences in terms of both predictive performance and stability of selected gene sets. As a case study, we considered three benchmarks deriving from DNA micro-array experiments and conducted a comparative analysis among eight selection methods, representative of different classes of feature selection techniques. Our results show that the proposed approach can provide useful insight about the pattern of agreement of biomarker discovery techniques

    Revenus et niveaux de bilinguisme écrit et oral : les hommes québécois en 1971

    Get PDF
    This paper examines the roles of second language oral and written skills in the determination of the earnings of a sample of Quebec men in 1971. A log-linear earnings equation is used with education and experience as additional independent variables. The main results are that it is preferable to measure language skills as precisely as possible and that both oral and written skills play a role in the earnings determination equation, but the former more so
    corecore