185 research outputs found
Entropy of English text: Experiments with humans and a machine learning system based on rough sets
The goal of this paper is to show the dependency of the entropy of English text on the subject of the experiment, the type of English text, and the methodology used to estimate the entropy. Claude Shannon first described the technique for estimating the entropy of English text by a human subject guessing the next letter after viewing a string of characters taken from actual text. We show how this result is affected by using different humans in the experiment (Shannon used only his wife) and by using different types of text material (Shannon used only a single book). We also show how the results are affected when we replace the human subjects with a machine learning system based on rough sets. Automating the play of the guessing game with this system, called LERS, gives rise to a lossless data compression scheme. (C) Elsevier Science Inc. 1998
Dealing with Missing Data and Uncertainty in the Context of Data Mining
Missing data is an issue in many real-world datasets yet robust methods for dealing with missing data appropriately still need development. In this paper we conduct an investigation of how some methods for handling missing data perform when the uncertainty increases. Using benchmark datasets from the UCI Machine Learning repository we generate datasets for our experimentation with increasing amounts of data Missing Completely At Random (MCAR) both at the attribute level and at the record level. We then apply four classification algorithms: C4.5, Random Forest, Naïve Bayes and Support Vector Machines (SVMs). We measure the performance of each classifiers on the basis of complete case analysis, simple imputation and then we study the performance of the algorithms that can handle missing data. We find that complete case analysis has a detrimental effect because it renders many datasets infeasible when missing data increases, particularly for high dimensional data. We find that increasing missing data does have a negative effect on the performance of all the algorithms tested but the different algorithms tested either using preprocessing in the form of simple imputation or handling the missing data do not show a significant difference in performance
Around the Hossz\'u-Gluskin theorem for -ary groups
We survey results related to the important Hossz\'u-Gluskin Theorem on
-ary groups adding also several new results and comments. The aim of this
paper is to write all such results in uniform and compressive forms. Therefore
some proofs of new results are only sketched or omitted if their completing
seems to be not too difficult for readers. In particular, we show as the
Hossz\'u-Gluskin Theorem can be used for evaluation how many different -ary
groups (up to isomorphism) exist on some small sets. Moreover, we sketch as the
mentioned theorem can be also used for investigation of
-independent subsets of semiabelian -ary groups for some
special families of mappings
Co-optation & Clientelism: Nested Distributive Politics in China’s Single-Party Dictatorship
What explains the persistent growth of public employment in reform-era
China despite repeated and forceful downsizing campaigns? Why do some provinces
retain more public employees and experience higher rates of bureaucratic expansion
than others? Among electoral regimes, the creation and distribution of public jobs is
typically attributed to the politics of vote buying and multi-party competition. Electoral
factors, however, cannot explain the patterns observed in China’s single-party dictatorship. This study highlights two nested factors that influence public employment in
China: party co-optation and personal clientelism. As a collective body, the ruling party
seeks to co-opt restive ethnic minorities by expanding cadre recruitment in hinterland
provinces. Within the party, individual elites seek to expand their own networks of
power by appointing clients to office. The central government’s professed objective of
streamlining bureaucracy is in conflict with the party’s co-optation goal and individual
elites’ clientelist interest. As a result, the size of public employment has inflated during
the reform period despite top-down mandates to downsize bureaucracy.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/116599/1/Ang, Cooptation & Clientelism, posted 2016-01.pdfDescription of Ang, Cooptation & Clientelism, posted 2016-01.pdf : First Onlin
The historical origins of corruption in the developing world: a comparative analysis of East Asia
A new approach has emerged in the literature on corruption in the developing world that breaks with the assumption that corruption is driven by individualistic self-interest and, instead, conceptualizes corruption as an informal system of norms and practices. While this emerging neo-institutionalist approach has done much to further our understanding of corruption in the developing world, one key question has received relatively little attention: how do we explain differences in the institutionalization of corruption between developing countries? The paper here addresses this question through a systematic comparison of seven developing and newly industrialized countries in East Asia. The argument that emerges through this analysis is that historical sequencing mattered: countries in which the "political marketplace" had gone through a process of concentration before universal suffrage was introduced are now marked by less harmful types of corruption than countries where mass voting rights where rolled out in a context of fragmented political marketplaces. The paper concludes by demonstrating that this argument can be generalized to the developing world as a whole
Citizenship Norms in Eastern Europe
Research on Eastern Europe stresses the weakness of its civil society and the lack of political and social involvement, neglecting the question: What do people themselves think it means to be a good citizen? This study looks at citizens’ definitions of good citizenship in Poland, Slovenia, the Czech Republic and Hungary, using 2002 European Social Survey data. We investigate mean levels of civic mindedness in these countries and perform regression analyses to investigate whether factors traditionally associated with civic and political participation are also correlated with citizenship norms across Eastern Europe. We show that mean levels of civic mindedness differ significantly across the four Eastern European countries. We find some support for theories on civic and political participation when explaining norms of citizenship, but also demonstrate that individual-level characteristics are differently related to citizenship norms across the countries of our study. Hence, our findings show that Eastern Europe is not a monolithic and homogeneous bloc, underscoring the importance of taking the specificities of countries into account
State owned enterprises as bribe payers: the role of institutional environment
Our paper draws attention to a neglected channel of corruption—the bribe payments by state-owned enterprises (SOEs). This is an important phenomenon as bribe payments by SOEs fruitlessly waste national resources, compromising public welfare and national prosperity. Using a large dataset of 30,249 firms from 50 countries, we show that, in general, SOEs are less likely to pay bribes for achieving organizational objectives owing to their political connectivity. However, in deteriorated institutional environments, SOEs may be subjected to potential managerial rent-seeking behaviors, which disproportionately increase SOE bribe propensity relative to privately owned enterprises. Specifically, our findings highlight the importance of fostering democracy and rule of law, reducing prevalence of corruption and shortening power distance in reducing the incidence of SOE bribery
- …