87,705 research outputs found

    Semantic Information G Theory and Logical Bayesian Inference for Machine Learning

    Get PDF
    An important problem with machine learning is that when label number n\u3e2, it is very difficult to construct and optimize a group of learning functions, and we wish that optimized learning functions are still useful when prior distribution P(x) (where x is an instance) is changed. To resolve this problem, the semantic information G theory, Logical Bayesian Inference (LBI), and a group of Channel Matching (CM) algorithms together form a systematic solution. MultilabelMultilabel A semantic channel in the G theory consists of a group of truth functions or membership functions. In comparison with likelihood functions, Bayesian posteriors, and Logistic functions used by popular methods, membership functions can be more conveniently used as learning functions without the above problem. In Logical Bayesian Inference (LBI), every label’s learning is independent. For Multilabel learning, we can directly obtain a group of optimized membership functions from a big enough sample with labels, without preparing different samples for different labels. A group of Channel Matching (CM) algorithms are developed for machine learning. For the Maximum Mutual Information (MMI) classification of three classes with Gaussian distributions on a two-dimensional feature space, 2-3 iterations can make mutual information between three classes and three labels surpass 99% of the MMI for most initial partitions. For mixture models, the Expectation-Maxmization (EM) algorithm is improved and becomes the CM-EM algorithm, which can outperform the EM algorithm when mixture ratios are imbalanced, or local convergence exists. The CM iteration algorithm needs to combine neural networks for MMI classifications on high-dimensional feature spaces. LBI needs further studies for the unification of statistics and logic

    A model for structured document retrieval : empirical investigations

    Get PDF
    Documents often display a structure, e.g., several sections, each with several subsections and so on. Taking into account the structure of a document allows the retrieval process to focus on those parts of the document that are most relevant to an information need. In previous work, we developed a model for the representation and the retrieval of structured documents. This paper reports the first experimental study of the effectiveness and applicability of the model

    Integrating and Ranking Uncertain Scientific Data

    Get PDF
    Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates

    Crime and Social media

    Get PDF
    Purpose-The study complements the scant macroeconomic literature on the development outcomes of social media by examining the relationship between Facebook penetration and violent crime levels in a cross-section of 148 countries for the year 2012. Design/methodology/approach-The empirical evidence is based on Ordinary Least Squares (OLS), Tobit and Quantile regressions. In order to respond to policy concerns on the limited evidence on the consequences of social media in developing countries, the dataset is disaggregated into regions and income levels. The decomposition by income levels included: low income, lower middle income, upper middle income and high income. The corresponding regions include: Europe and Central Asia, East Asia and the Pacific, Middle East and North Africa, Sub-Saharan Africa and Latin America. Findings-From OLS and Tobit regressions, there is a negative relationship between Facebook penetration and crime. However, Quantile regressions reveal that the established negative relationship is noticeable exclusively in the 90th crime decile. Further, when the dataset is decomposed into regions and income levels, the negative relationship is evident in the Middle East and North Africa (MENA) while a positive relationship is confirmed for sub-Saharan Africa. Policy implications are discussed. Originality/value- Studies on the development outcomes of social media are sparse because of a lack of reliable macroeconomic data on social media. This study primarily complemented three existing studies that have leveraged on a newly available dataset on Facebook
    corecore