134 research outputs found

    DFKI publications : the first four years ; 1990 - 1993

    Get PDF

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

    Get PDF
    El actual diluvio de datos está inundando la web con grandes volúmenes de datos representados en RDF, dando lugar a la denominada 'Web de Datos'. En esta tesis proponemos, en primer lugar, un estudio profundo de aquellos textos que nos permitan abordar un conocimiento global de la estructura real de los conjuntos de datos RDF, HDT, que afronta la representación eficiente de grandes volúmenes de datos RDF a través de estructuras optimizadas para su almacenamiento y transmisión en red. HDT representa efizcamente un conjunto de datos RDF a través de su división en tres componentes: la cabecera (Header), el diccionario (Dictionary) y la estructura de sentencias RDF (Triples). A continuación, nos centramos en proveer estructuras eficientes de dichos componentes, ocupando un espacio comprimido al tiempo que se permite el acceso directo a cualquier dat

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    Natural language semantics and compiler technology

    Get PDF
    This paper recommends an approach to the implementation of semantic representation languages (SRLs) which exploits a parallelism between SRLs and programming languages (PLs). The design requirements of SRLs for natural language are similar to those of PLs in their goals. First, in both cases we seek modules in which both the surface representation (print form) and the underlying data structures are important. This requirement highlights the need for general tools allowing the printing and reading of expressions (data structures). Second, these modules need to cooperate with foreign modules, so that the importance of interface technology (compilation) is paramount; and third, both compilers and semantic modules need "inferential" facilities for transforming (simplifying) complex expressions in order to ease subsequent processing. But the most important parallel is the need in both fields for tools which are useful in combination with a variety of concrete languages -- general purpose parsers, printers, simplifiers (transformation facilities) and compilers. This arises in PL technology from (among other things) the need for experimentation in language design, which is again parallel to the case of SRLs. Using a compiler-based approach, we have implemented NLL, a public domain software package for computational natural language semantics. Several interfaces exist both for grammar modules and for applications, using a variety of interface technologies, including especially compilation. We review here a variety of NLL, applications, focusing on COSMA, an NL interface to a distributed appointment manager

    DFKI publications : the first four years ; 1990 - 1993

    Get PDF

    An Introduction to Database Systems

    Get PDF
    This textbook introduces the basic concepts of database systems. These concepts are presented through numerous examples in modeling and design. The material in this book is geared to an introductory course in database systems offered at the junior or senior level of Computer Science. It could also be used in a first year graduate course in database systems, focusing on a selection of the advanced topics in the latter chapters

    Semantic Similarity of Spatial Scenes

    Get PDF
    The formalization of similarity in spatial information systems can unleash their functionality and contribute technology not only useful, but also desirable by broad groups of users. As a paradigm for information retrieval, similarity supersedes tedious querying techniques and unveils novel ways for user-system interaction by naturally supporting modalities such as speech and sketching. As a tool within the scope of a broader objective, it can facilitate such diverse tasks as data integration, landmark determination, and prediction making. This potential motivated the development of several similarity models within the geospatial and computer science communities. Despite the merit of these studies, their cognitive plausibility can be limited due to neglect of well-established psychological principles about properties and behaviors of similarity. Moreover, such approaches are typically guided by experience, intuition, and observation, thereby often relying on more narrow perspectives or restrictive assumptions that produce inflexible and incompatible measures. This thesis consolidates such fragmentary efforts and integrates them along with novel formalisms into a scalable, comprehensive, and cognitively-sensitive framework for similarity queries in spatial information systems. Three conceptually different similarity queries at the levels of attributes, objects, and scenes are distinguished. An analysis of the relationship between similarity and change provides a unifying basis for the approach and a theoretical foundation for measures satisfying important similarity properties such as asymmetry and context dependence. The classification of attributes into categories with common structural and cognitive characteristics drives the implementation of a small core of generic functions, able to perform any type of attribute value assessment. Appropriate techniques combine such atomic assessments to compute similarities at the object level and to handle more complex inquiries with multiple constraints. These techniques, along with a solid graph-theoretical methodology adapted to the particularities of the geospatial domain, provide the foundation for reasoning about scene similarity queries. Provisions are made so that all methods comply with major psychological findings about people’s perceptions of similarity. An experimental evaluation supplies the main result of this thesis, which separates psychological findings with a major impact on the results from those that can be safely incorporated into the framework through computationally simpler alternatives

    A journey through computability, topology and analysis

    Get PDF
    This thesis is devoted to the exploration of the complexity of some mathematical problems using the framework of computable analysis and descriptive set theory. We will especially focus on Weihrauch reducibility, as a means to compare the uniform computational strength of problems. After a short introduction of the relevant background notions, we investigate the uniform computational content of the open and clopen Ramsey theorems. In particular, since there is not a canonical way to phrase these theorems as multi-valued functions, we identify 8 different multi-valued functions (5 corresponding to the open Ramsey theorem and 3 corresponding to the clopen Ramsey theorem) and study their degree from the point of view of Weihrauch, strong Weihrauch and arithmetic Weihrauch reducibility. We then discuss some new operators on multi-valued functions and study their algebraic properties and the relations with other previously studied operators on problems. These notions turn out to be extremely relevant when exploring the Weihrauch degree of the problem DS of computing descending sequences in ill-founded linear orders. They allow us to show that DS, and the Weihrauch equivalent problem BS of finding bad sequences through non-well quasi-orders, while being very "hard" to solve, are rather weak in terms of uniform computational strength. We then generalize DS and BS by considering Gamma-presented orders, where Gamma is a Borel pointclass or Delta11, Sigma11, Pi11. We study the obtained DS-hierarchy and BS-hierarchy of problems in comparison with the (effective) Baire hierarchy and show that they do not collapse at any finite level. Finally, we focus on the characterization, from the point of view of descriptive set theory, of some conditions involving the notions of Hausdorff/Fourier dimension and of Salem sets. We first work in the hyperspace K([0,1]) of compact subsets of [0,1] and show that the closed Salem sets form a Pi03-complete family. This is done by characterizing the complexity of the family of sets having sufficiently large Hausdorff or Fourier dimension. We also show that the complexity does not change if we increase the dimension of the ambient space and work in K([0,1]^d). We also generalize the results by relaxing the compactness of the ambient space, and show that the closed Salem sets are still Pi03-complete when we endow K(R^d) with the Fell topology. A similar result holds also for the Vietoris topology. We conclude by showing how these results can be used to characterize the Weihrauch degree of the functions computing the Hausdorff and Fourier dimensions
    • …
    corecore