77 research outputs found

    Author index volume 145 (1995)

    Get PDF

    Reordering Rows for Better Compression: Beyond the Lexicographic Order

    Get PDF
    Sorting database tables before compressing them improves the compression rate. Can we do better than the lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, the best approaches to row-ordering are derived from traveling salesman heuristics, although there is a significant trade-off between running time and compression. A new heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades off compression for a major running-time speedup, is a good option for very large tables. However, for some compression schemes, it is more important to generate long runs rather than few runs. For this case, another novel heuristic, Vortex, is promising. We find that we can improve run-length encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%: these gains are on top of the gains due to lexicographically sorting the table. We prove that the new row reordering is optimal (within 10%) at minimizing the runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    THREE TEMPORAL PERSPECTIVES ON DECENTRALIZED LOCATION-AWARE COMPUTING: PAST, PRESENT, FUTURE

    Get PDF
    Durant les quatre derniĂšres dĂ©cennies, la miniaturisation a permis la diffusion Ă  large Ă©chelle des ordinateurs, les rendant omniprĂ©sents. Aujourd’hui, le nombre d’objets connectĂ©s Ă  Internet ne cesse de croitre et cette tendance n’a pas l’air de ralentir. Ces objets, qui peuvent ĂȘtre des tĂ©lĂ©phones mobiles, des vĂ©hicules ou des senseurs, gĂ©nĂšrent de trĂšs grands volumes de donnĂ©es qui sont presque toujours associĂ©s Ă  un contexte spatiotemporel. Le volume de ces donnĂ©es est souvent si grand que leur traitement requiert la crĂ©ation de systĂšme distribuĂ©s qui impliquent la coopĂ©ration de plusieurs ordinateurs. La capacitĂ© de traiter ces donnĂ©es revĂȘt une importance sociĂ©tale. Par exemple: les donnĂ©es collectĂ©es lors de trajets en voiture permettent aujourd’hui d’éviter les em-bouteillages ou de partager son vĂ©hicule. Un autre exemple: dans un avenir proche, les donnĂ©es collectĂ©es Ă  l’aide de gyroscopes capables de dĂ©tecter les trous dans la chaussĂ©e permettront de mieux planifier les interventions de maintenance Ă  effectuer sur le rĂ©seau routier. Les domaines d’applications sont par consĂ©quent nombreux, de mĂȘme que les problĂšmes qui y sont associĂ©s. Les articles qui composent cette thĂšse traitent de systĂšmes qui partagent deux caractĂ©ristiques clĂ©s: un contexte spatiotemporel et une architecture dĂ©centralisĂ©e. De plus, les systĂšmes dĂ©crits dans ces articles s’articulent autours de trois axes temporels: le prĂ©sent, le passĂ©, et le futur. Les systĂšmes axĂ©s sur le prĂ©sent permettent Ă  un trĂšs grand nombre d’objets connectĂ©s de communiquer en fonction d’un contexte spatial avec des temps de rĂ©ponses proche du temps rĂ©el. Nos contributions dans ce domaine permettent Ă  ce type de systĂšme dĂ©centralisĂ© de s’adapter au volume de donnĂ©e Ă  traiter en s’étendant sur du matĂ©riel bon marchĂ©. Les systĂšmes axĂ©s sur le passĂ© ont pour but de faciliter l’accĂšs a de trĂšs grands volumes donnĂ©es spatiotemporelles collectĂ©es par des objets connectĂ©s. En d’autres termes, il s’agit d’indexer des trajectoires et d’exploiter ces indexes. Nos contributions dans ce domaine permettent de traiter des jeux de trajectoires particuliĂšrement denses, ce qui n’avait pas Ă©tĂ© fait auparavant. Enfin, les systĂšmes axĂ©s sur le futur utilisent les trajectoires passĂ©es pour prĂ©dire les trajectoires que des objets connectĂ©s suivront dans l’avenir. Nos contributions permettent de prĂ©dire les trajectoires suivies par des objets connectĂ©s avec une granularitĂ© jusque lĂ  inĂ©galĂ©e. Bien qu’impliquant des domaines diffĂ©rents, ces contributions s’articulent autour de dĂ©nominateurs communs des systĂšmes sous-jacents, ouvrant la possibilitĂ© de pouvoir traiter ces problĂšmes avec plus de gĂ©nĂ©ricitĂ© dans un avenir proche. -- During the past four decades, due to miniaturization computing devices have become ubiquitous and pervasive. Today, the number of objects connected to the Internet is in- creasing at a rapid pace and this trend does not seem to be slowing down. These objects, which can be smartphones, vehicles, or any kind of sensors, generate large amounts of data that are almost always associated with a spatio-temporal context. The amount of this data is often so large that their processing requires the creation of a distributed system, which involves the cooperation of several computers. The ability to process these data is important for society. For example: the data collected during car journeys already makes it possible to avoid traffic jams or to know about the need to organize a carpool. Another example: in the near future, the maintenance interventions to be carried out on the road network will be planned with data collected using gyroscopes that detect potholes. The application domains are therefore numerous, as are the prob- lems associated with them. The articles that make up this thesis deal with systems that share two key characteristics: a spatio-temporal context and a decentralized architec- ture. In addition, the systems described in these articles revolve around three temporal perspectives: the present, the past, and the future. Systems associated with the present perspective enable a very large number of connected objects to communicate in near real-time, according to a spatial context. Our contributions in this area enable this type of decentralized system to be scaled-out on commodity hardware, i.e., to adapt as the volume of data that arrives in the system increases. Systems associated with the past perspective, often referred to as trajectory indexes, are intended for the access to the large volume of spatio-temporal data collected by connected objects. Our contributions in this area makes it possible to handle particularly dense trajectory datasets, a problem that has not been addressed previously. Finally, systems associated with the future per- spective rely on past trajectories to predict the trajectories that the connected objects will follow. Our contributions predict the trajectories followed by connected objects with a previously unmet granularity. Although involving different domains, these con- tributions are structured around the common denominators of the underlying systems, which opens the possibility of being able to deal with these problems more generically in the near future

    Cross-repository aggregation of educational resources

    Get PDF
    The proliferation of educational resource repositories promoted the development of aggregators to facilitate interoperability, that is, a unified access that would allow users to fetch a given resource independently of its origin. The CROERA system is a repository aggregator that provides access to educational resources independently of the classification taxonomy utilized in the hosting repository. For that, an automated classification algorithm is trained using the information extracted from the metadata of a collection of educational resources hosted in different repositories, which in turn depends on the classification taxonomy used in each case. Then, every resource will be automatically classified on demand independently of the original classification scheme. As a consequence, resources can be retrieved independently of the original taxonomy utilized using any taxonomy supported by the aggregator, and exploratory searches can be made without a previous taxonomy mapping. This approach overcomes one of the recurring problems in taxonomy mapping, namely the one-to-none matching situation. To evaluate the performance of this proposal two methods were applied. Resource classification in categories existing in all repositories was automatically evaluated, obtaining maximum performance values of 84% (F1 score), 87.8% (area under the receiver operator characteristic curve), 86% (area under the precision-recall curve) and 75.1% (Cohen's Îș). In the case of resources not belonging to one of the common categories, human inspection was used as a reference to compute classification performance. In this case, maximum performance values obtained were respectively 69.8%, 73.8%, 75% and 54.3%. These results demonstrate the potential of this approach as a tool to facilitate resource classification, for example to provide a preliminary classification that would require just minor corrections from human classifiers.Xunta de Galicia | Ref. R2014/034 (RedPlir)Xunta de Galicia | Ref. R2014/029 (TELGalicia

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
    • 

    corecore