1,842 research outputs found

    Democracy and Economic Development: a Fuzzy Classification Approach

    Get PDF
    The aim of this work is to (1) analyse whether countries differ on political indicators (democracy, rule of law, government effectiveness and corruption) and (2) study whether countries with different political profiles are associated with different levels of economic, human development and gender-related development indicators. Using a fuzzy classification approach (fuzzy k-means algorithm), we propose a typology of 124 countries based on 10 political variables. Six segments are identified; these political groups implicate the access to different levels of economic and human development. In this study evidence of a positive but not perfect relationship between democracy and economic and human development is observed, thus presenting new insights for the understanding of the heterogeneity of behaviors relatively to political indicators.Democracy, Economic Development, Fuzzy k-means

    A framework for clustering and adaptive topic tracking on evolving text and social media data streams.

    Get PDF
    Recent advances and widespread usage of online web services and social media platforms, coupled with ubiquitous low cost devices, mobile technologies, and increasing capacity of lower cost storage, has led to a proliferation of Big data, ranging from, news, e-commerce clickstreams, and online business transactions to continuous event logs and social media expressions. These large amounts of online data, often referred to as data streams, because they get generated at extremely high throughputs or velocity, can make conventional and classical data analytics methodologies obsolete. For these reasons, the issues of management and analysis of data streams have been researched extensively in recent years. The special case of social media Big Data brings additional challenges, particularly because of the unstructured nature of the data, specifically free text. One classical approach to mine text data has been Topic Modeling. Topic Models are statistical models that can be used for discovering the abstract ``topics\u27\u27 that may occur in a corpus of documents. Topic models have emerged as a powerful technique in machine learning and data science, providing a great balance between simplicity and complexity. They also provide sophisticated insight without the need for real natural language understanding. However they have not been designed to cope with the type of text data that is abundant on social media platforms, but rather for traditional medium size corpora consisting of longer documents, adhering to a specific language and typically spanning a stable set of topics. Unlike traditional document corpora, social media messages tend to be very short, sparse, noisy, and do not adhere to a standard vocabulary, linguistic patterns, or stable topic distributions. They are also generated at high velocity that impose high demands on topic modeling; and their evolving or dynamic nature, makes any set of results from topic modeling quickly become stale in the face of changes in the textual content and topics discussed within social media streams. In this dissertation, we propose an integrated topic modeling framework built on top of an existing stream-clustering framework called Stream-Dashboard, which can extract, isolate, and track topics over any given time period. In this new framework, Stream Dashboard first clusters the data stream points into homogeneous groups. Then data from each group is ushered to the topic modeling framework which extracts finer topics from the group. The proposed framework tracks the evolution of the clusters over time to detect milestones corresponding to changes in topic evolution, and to trigger an adaptation of the learned groups and topics at each milestone. The proposed approach to topic modeling is different from a generic Topic Modeling approach because it works in a compartmentalized fashion, where the input document stream is split into distinct compartments, and Topic Modeling is applied on each compartment separately. Furthermore, we propose extensions to existing topic modeling and stream clustering methods, including: an adaptive query reformulation approach to help focus on the topic discovery with time; a topic modeling extension with adaptive hyper-parameter and with infinite vocabulary; an adaptive stream clustering algorithm incorporating the automated estimation of dynamic, cluster-specific temporal scales for adaptive forgetting to help facilitate clustering in a fast evolving data stream. Our experimental results show that the proposed adaptive forgetting clustering algorithm can mine better quality clusters; that our proposed compartmentalized framework is able to mine topics of better quality compared to competitive baselines; and that the proposed framework can automatically adapt to focus on changing topics using the proposed query reformulation strategy

    On Utilizing Association and Interaction Concepts for Enhancing Microaggregation in Secure Statistical Databases

    Get PDF
    This paper presents a possibly pioneering endeavor to tackle the microaggregation techniques (MATs) in secure statistical databases by resorting to the principles of associative neural networks (NNs). The prior art has improved the available solutions to the MAT by incorporating proximity information, and this approach is done by recursively reducing the size of the data set by excluding points that are farthest from the centroid and points that are closest to these farthest points. Thus, although the method is extremely effective, arguably, it uses only the proximity information while ignoring the mutual interaction between the records. In this paper, we argue that interrecord relationships can be quantified in terms of the following two entities: 1) their ldquoassociationrdquo and 2) their ldquointeraction.rdquo This case means that records that are not necessarily close to each other may still be ldquogrouped,rdquo because their mutual interaction, which is quantified by invoking transitive-closure-like operations on the latter entity, could be significant, as suggested by the theoretically sound principles of NNs. By repeatedly invoking the interrecord associations and interactions, the records are grouped into sizes of cardinality ldquok,rdquo where k is the security parameter in the algorithm. Our experimental results, which are done on artificial data and benchmark real-life data sets, demonstrate that the newly proposed method is superior to the state of the art not only based on the information loss (IL) perspective but also when it concerns a criterion that involves a combination of the IL and the disclosure risk (DR)

    Feature Grouping-based Feature Selection

    Get PDF

    Conceptual Based Hidden Data Analytics and Reduction Method for System Interface Enhancement Through Handheld devices

    Get PDF
    With the increasing demand placed on online systems by users, many organizations and companies are seeking to enhance their online interfaces to facilitate the search process on their hidden databases. Usually, users issue queries to a hidden database by using the search template provided by the system. In this thesis, a new approach based mainly on hidden database reduction preserving functional dependencies is developed for enhancing the online system interface through a small screen device. The developed approach is applied to online market systems like eBay. Offline hidden data analysis is used to discover attributes and their domains and different functional dependencies. In this thesis, a comparative study between several methods for mining functional dependencies shows the advantage of conceptual methods for data reduction. In addition, by using online consecutive reductions on search results, we adopted a method of displaying results in order of decreasing relevance. The validation of the proposed designed and developed methods prove their generality and suitability for system interfacing through continuous data reductions.NPRP-07-794-1-145 grant from the Qatar National Research Fund (a member of Qatar foundation

    What are we learning from the life satisfaction literature?

    Get PDF
    The recent availability of cross-sectional and longitudinal survey data on life satisfaction in a large number of countries gives us the opportunity to verify empirically (and not just to assume)what matters for individuals and what economists and policymakers should take into account when trying to promote personal and societal wellbeing. The wide array of econometric findings available in this booming literature display evidence, generally robust to different cultural backgrounds, on the effects of some important happiness drivers (income, unemployment, marital status) which can be co*nsidered “quasi stylized facts” of happiness. If economic policies, for many obvious reasons, cannot maximize self declared life satisfaction as such, we are nonetheless learning a lot from these contributions. In particular, results on the relevance and the risk of crowding out of relational goods, on the revisited inflation/unemployment trade off and, more in general, on the measurement of the shadow value of non market goods obtained with life satisfaction estimates, are conveying relevant information about individual preferences and what is behind utility functions. Such findings suggest us to move beyond anthropological reductionism toward behavioral complexity and to refocus target indicators of economic policies in order to minimize the distance between economic development and human wellbeing.life satisfaction, shadow value of non market goods, unemployment/inflation

    Relaxed Functional Dependencies - A Survey of Approaches

    Get PDF
    Recently, there has been a renovated interest in functional dependencies due to the possibility of employing them in several advanced database operations, such as data cleaning, query relaxation, record matching, and so forth. In particular, the constraints defined for canonical functional dependencies have been relaxed to capture inconsistencies in real data, patterns of semantically related data, or semantic relationships in complex data types. In this paper, we have surveyed 35 of such functional dependencies, providing a classification criteria, motivating examples, and a systematic analysis of them

    Uncertainty and Interpretability Studies in Soft Computing with an Application to Complex Manufacturing Systems

    Get PDF
    In systems modelling and control theory, the benefits of applying neural networks have been extensively studied. Particularly in manufacturing processes, such as the prediction of mechanical properties of heat treated steels. However, modern industrial processes usually involve large amounts of data and a range of non-linear effects and interactions that might hinder their model interpretation. For example, in steel manufacturing the understanding of complex mechanisms that lead to the mechanical properties which are generated by the heat treatment process is vital. This knowledge is not available via numerical models, therefore an experienced metallurgist estimates the model parameters to obtain the required properties. This human knowledge and perception sometimes can be imprecise leading to a kind of cognitive uncertainty such as vagueness and ambiguity when making decisions. In system classification, this may be translated into a system deficiency - for example, small input changes in system attributes may result in a sudden and inappropriate change for class assignation. In order to address this issue, practitioners and researches have developed systems that are functional equivalent to fuzzy systems and neural networks. Such systems provide a morphology that mimics the human ability of reasoning via the qualitative aspects of fuzzy information rather by its quantitative analysis. Furthermore, these models are able to learn from data sets and to describe the associated interactions and non-linearities in the data. However, in a like-manner to neural networks, a neural fuzzy system may suffer from a lost of interpretability and transparency when making decisions. This is mainly due to the application of adaptive approaches for its parameter identification. Since the RBF-NN can be treated as a fuzzy inference engine, this thesis presents several methodologies that quantify different types of uncertainty and its influence on the model interpretability and transparency of the RBF-NN during its parameter identification. Particularly, three kind of uncertainty sources in relation to the RBF-NN are studied, namely: entropy, fuzziness and ambiguity. First, a methodology based on Granular Computing (GrC), neutrosophic sets and the RBF-NN is presented. The objective of this methodology is to quantify the hesitation produced during the granular compression at the low level of interpretability of the RBF-NN via the use of neutrosophic sets. This study also aims to enhance the disitnguishability and hence the transparency of the initial fuzzy partition. The effectiveness of the proposed methodology is tested against a real case study for the prediction of the properties of heat-treated steels. Secondly, a new Interval Type-2 Radial Basis Function Neural Network (IT2-RBF-NN) is introduced as a new modelling framework. The IT2-RBF-NN takes advantage of the functional equivalence between FLSs of type-1 and the RBF-NN so as to construct an Interval Type-2 Fuzzy Logic System (IT2-FLS) that is able to deal with linguistic uncertainty and perceptions in the RBF-NN rule base. This gave raise to different combinations when optimising the IT2-RBF-NN parameters. Finally, a twofold study for uncertainty assessment at the high-level of interpretability of the RBF-NN is provided. On the one hand, the first study proposes a new methodology to quantify the a) fuzziness and the b) ambiguity at each RU, and during the formation of the rule base via the use of neutrosophic sets theory. The aim of this methodology is to calculate the associated fuzziness of each rule and then the ambiguity related to each normalised consequence of the fuzzy rules that result from the overlapping and to the choice with one-to-many decisions respectively. On the other hand, a second study proposes a new methodology to quantify the entropy and the fuzziness that come out from the redundancy phenomenon during the parameter identification. To conclude this work, the experimental results obtained through the application of the proposed methodologies for modelling two well-known benchmark data sets and for the prediction of mechanical properties of heat-treated steels conducted to publication of three articles in two peer-reviewed journals and one international conference

    Techniques for improving efficiency and scalability for the integration of information retrieval and databases

    Get PDF
    PhDThis thesis is on the topic of integration of Information Retrieval (IR) and Databases (DB), with particular focuses on improving efficiency and scalability of integrated IR and DB technology (IR+DB). The main purpose of this study is to develop efficient and scalable techniques for supporting integrated IR and DB technology, which is a popular approach today for handling complex queries over text and structured data. Our specific interest in this thesis is how to efficiently handle queries over large-scale text and structured data. The work is based on a technology that integrates probability theory and relational algebra, where retrievals for text and data are to be expressed in probabilistic logical programs such as probabilistic relational algebra or probabilistic Datalog. To support efficient processing of probabilistic logical programs, we proposed three optimization techniques that focus on aspects covered logical and physical layers, which include: scoring-driven query optimization using scoring expression, query processing with top-k incorporated pipeline, and indexing with relational inverted index. Specifically, scoring expressions are proposed for expressing the scoring or probabilistic semantics of implied scoring functions of PRA expressions, so that efficient query execution plan can be generated by rule-based scoring-driven optimizer. Secondly, to balance efficiency and effectiveness so that to improve query response time, we studied methods for incorporating topk algorithms into pipelined query execution engine for IR+DB systems. Thirdly, the proposed relational inverted index integrates IR-style inverted index and DB-style tuple-based index, which can be used to support efficient probability estimation and aggregation as well as conventional relational operations. Experiments were carried out to investigate the performances of proposed techniques. Experimental results showed that the efficiency and scalability of an IR+DB prototype have been improved, while the system can handle queries efficiently on considerable large data sets for a number of IR tasks

    Exploring the Boundary Region of Tolerance Rough Sets for Feature Selection

    Get PDF
    Of all of the challenges which face the effective application of computational intelli-gence technologies for pattern recognition, dataset dimensionality is undoubtedly one of the primary impediments. In order for pattern classifiers to be efficient, a dimensionality reduction stage is usually performed prior to classification. Much use has been made of Rough Set Theory for this purpose as it is completely data-driven and no other information is required; most other methods require some additional knowledge. However, traditional rough set-based methods in the literature are restricted to the requirement that all data must be discrete. It is therefore not possible to consider real-valued or noisy data. This is usually addressed by employing a discretisation method, which can result in information loss. This paper proposes a new approach based on the tolerance rough set model, which has the abil-ity to deal with real-valued data whilst simultaneously retaining dataset semantics. More significantly, this paper describes the underlying mechanism for this new approach to utilise the information contained within the boundary region or region of uncertainty. The use of this information can result in the discovery of more compact feature subsets and improved classification accuracy. These results are supported by an experimental evaluation which compares the proposed approach with a number of existing feature selection techniques. Key words: feature selection, attribute reduction, rough sets, classification
    corecore