761 research outputs found

    Learning Tuple Probabilities

    Get PDF
    Learning the parameters of complex probabilistic-relational models from labeled training data is a standard technique in machine learning, which has been intensively studied in the subfield of Statistical Relational Learning (SRL), but---so far---this is still an under-investigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from labeled lineage formulas. The resulting learning problem can be viewed as the inverse problem to confidence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal probabilities of the query answers again yield in the assigned probability labels. We analyze the learning problem from a theoretical perspective, cast it into an optimization problem, and provide an algorithm based on stochastic gradient descent. Finally, we conclude by an experimental evaluation on three real-world and one synthetic dataset, thus comparing our approach to various techniques from SRL, reasoning in information extraction, and optimization

    A survey of statistical network models

    Full text link
    Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

    MULTIDIMENSIONAL DATABASES IN INFORMATION SYSTEMS OF UNIVERSITIES

    Get PDF
    The article is devoted to the description of the method of multidimensional database, which is an effective method of data storage, which allows analyzing data qualitatively, and most importantly in a short time. The article discusses the capabilities of multidimensional databases, in particular, multidimensional OLAP (On-Line Analytical Processing) cubes when analyzing large amounts of data. Provides an overview and features of a multidimensional database and discusses the steps you need to take with a multidimensional database to understand the structure and capabilities of an OLAP cube. To create a knowledge base, it describes the steps you can take to create and execute a multidimensional database that you can collect from various sources, save to a database, and then prepare a report using OLAP analysis. Various information system data processing technologies such as OLTP and OLAP were considered. The algorithm of the data storage process for analysis purposes was studied. A model of a multidimensional database in the form of a three-dimensional cube was presented. Examples of analysis and ways of obtaining information from the data cube were also given. The use of a multidimensional database in higher education institutions as a simple and effective method of data storage is considered. There are also illustrations of the structure of a higher educational institution to see the bulkiness of information, and what kind of information database operates in the educational institution

    Quorumpeps database : chemical space, microbial origin and functionality of quorum sensing peptides

    Get PDF
    Quorum-sensing (QS) peptides are biologically attractive molecules, with a wide diversity of structures and prone to modifications altering or presenting new functionalities. Therefore, the Quorumpeps database (http://quorumpeps.ugent.be) is developed to give a structured overview of the QS oligopeptides, describing their microbial origin (species), functionality (method, result and receptor), peptide links and chemical characteristics (3D-structure-derived physicochemical properties). The chemical diversity observed within this group of QS signalling molecules can be used to develop new synthetic bio-active compounds

    XLDM: an xlink-based multidimensional metamodel

    Get PDF
    The growth of data available on the Internet and the improvement of ways to handle them consist of an important issue while designing a data model. In this context, XML provides the necessary formalism to establish a standard to represent and exchange data. Since the technologies of data warehouse are often used for data analysis, it is necessary to define a cube model data to XML. However, data representation in XML may generate syntactic, semantic and structural heterogeneity problems on XML documents, which are not considered by related approaches. To solve these problems, it is required the definition of a data schema. This paper proposes a metamodel to specify XML document cubes, based on relationships between elements and XML documents. This approach solves the XML data heterogeneity problems by taking advantages of data schema definition and relationships defined by XLink. The methodology used provides formal rules to define the concepts proposed. Following this formalism is then instantiated using XML Schema and XLink. It also presents a case study in the medical field and a comparison with XBRL Dimensions and a financial and multidimensional data model which uses XLink

    Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems

    Get PDF
    Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ). This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)

    The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch

    Get PDF
    Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in Astronomy, special issue "Robotic Astronomy

    Endogenous space in the Net era

    Get PDF
    Libre Software communities are among the most interesting and advanced socio-economic laboratories on the Net. In terms of directions of Regional Science research, this paper addresses a simple question: “Is the socio-economics of digital nets out of scope for Regional Science, or might the latter expand to a cybergeography of digitally enhanced territories ?” As for most simple questions, answers are neither so obvious nor easy. The authors start drafting one in a positive sense, focussing upon a file rouge running across the paper: endogenous spaces woven by socio-economic processes. The drafted answer declines on an Evolutionary Location Theory formulation, together with two computational modelling views. Keywords: Complex networks, Computational modelling, Economics of Internet, Endogenous spaces, Evolutionary location theory, Free or Libre Software, Path dependence, Positionality.
    corecore