1,553 research outputs found

    A grid-based infrastructure for distributed retrieval

    Get PDF
    In large-scale distributed retrieval, challenges of latency, heterogeneity, and dynamicity emphasise the importance of infrastructural support in reducing the development costs of state-of-the-art solutions. We present a service-based infrastructure for distributed retrieval which blends middleware facilities and a design framework to ‘lift’ the resource sharing approach and the computational services of a European Grid platform into the domain of e-Science applications. In this paper, we give an overview of the DILIGENT Search Framework and illustrate its exploitation in the field of Earth Science

    Peer to Peer Information Retrieval: An Overview

    Get PDF
    Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom

    Using Simulation To Evaluate Web-Based Bidding in Construction

    Get PDF
    The Internet is changing the whole business model by allowing companies to communicate instantly with suppliers, partners, and customers on a worldwide scale. To enjoy realtime data exchange and higher transaction efficiencies, companies need to use information technology (IT) solutions and change how they distribute goods and how they collaborate within the company with contractors and suppliers. While the Internet is the channel that allows instant interaction between all components of a company, IT provides the ability to streamline the structure and to influence and control the flow of information. In this paper, we evaluate the impact that using the Internet can have on the procurement aspect of the construction industry. Specifically, we describe how the traditional service procurement process in construction is affected by the use of a web-based bidding tool (WBBT). We use a simulation model on a case study to evaluate how the WBBT affected service procurement in a large pharmaceutical company. The paper describes the potential impact of IT solutions in the Construction Industry and on the procurement aspect in particular, before discussing the case study in detail

    Peer clustering and firework query model in peer-to-peer networks.

    Get PDF
    Ng, Cheuk Hang.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 89-95).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Definition --- p.2Chapter 1.2 --- Main Contributions --- p.4Chapter 1.3 --- Thesis Organization --- p.5Chapter 2 --- Background --- p.6Chapter 2.1 --- Background of Peer-to-Peer --- p.6Chapter 2.2 --- Background of Content-Based Image Retrieval System --- p.9Chapter 2.3 --- Literature Review of Peer-to-Peer Application --- p.10Chapter 2.4 --- Literature Review of Discovery Mechanisms for Peer-to-Peer Applications --- p.13Chapter 2.4.1 --- Centralized Search --- p.13Chapter 2.4.2 --- Distributed Search - Flooding --- p.15Chapter 2.4.3 --- Distributed Search - Distributed Hash Table --- p.21Chapter 3 --- Peer Clustering and Firework Query Model --- p.25Chapter 3.1 --- Peer Clustering --- p.26Chapter 3.1.1 --- Peer Clustering - Simplified Version --- p.27Chapter 3.1.2 --- Peer Clustering - Single Cluster Version --- p.29Chapter 3.1.3 --- "Peer Clustering - Single Cluster, Multiple Layers of Con- nection Version" --- p.34Chapter 3.1.4 --- Peer Clustering - Multiple Clusters Version --- p.35Chapter 3.2 --- Firework Query Model Over Clustered Network --- p.38Chapter 4 --- Experiments and Results --- p.43Chapter 4.1 --- Simulation Model of Peer-to-Peer Network --- p.43Chapter 4.2 --- Performance Metrics --- p.45Chapter 4.3 --- Experiment Results --- p.47Chapter 4.3.1 --- Performances in different Number of Peers in P2P Network --- p.47Chapter 4.3.2 --- Performances in different TTL value of query packet in P2P Network --- p.52Chapter 4.3.3 --- "Performances in different different data sets, synthetic data and real data" --- p.55Chapter 4.3.4 --- Performances in different number of local clusters of each peer in P2P Network --- p.58Chapter 4.4 --- Evaluation of different clustering algorithms --- p.64Chapter 5 --- Distributed COntent-based Visual Information Retrieval (DIS- COVIR) --- p.67Chapter 5.1 --- Architecture of DISCOVIR and Functionality of DISCOVIR Components --- p.68Chapter 5.2 --- Flow of Operations --- p.72Chapter 5.2.1 --- Preprocessing (1) --- p.73Chapter 5.2.2 --- Connection Establishment (2) --- p.75Chapter 5.2.3 --- "Query Message Routing (3,4,5)" --- p.75Chapter 5.2.4 --- "Query Result Display (6,7)" --- p.78Chapter 5.3 --- Gnutella Message Modification --- p.78Chapter 5.4 --- DISCOVIR EVERYWHERE --- p.81Chapter 5.4.1 --- Design Goal of DISCOVIR Everywhere --- p.82Chapter 5.4.2 --- Architecture and System Components of DISCOVIR Ev- erywhere --- p.83Chapter 5.4.3 --- Flow of Operations --- p.84Chapter 5.4.4 --- Advantages of DISCOVIR Everywhere over Prevalent Web-based Search Engine --- p.86Chapter 6 --- Conclusion --- p.87Bibliography --- p.8

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Crowdsourcing-Based Fingerprinting for Indoor Location in Multi-Storey Buildings

    Get PDF
    POCI-01-0247-FEDER-033479The number of available indoor location solutions has been growing, however with insufficient precision, high implementation costs or scalability limitations. As fingerprinting-based methods rely on ubiquitous information in buildings, the need for additional infrastructure is discarded. Still, the time-consuming manual process to acquire fingerprints limits their applicability in most scenarios. This paper proposes an algorithm for the automatic construction of environmental fingerprints on multi-storey buildings, leveraging the information sources available in each scenario. It relies on unlabelled crowdsourced data from users’ smartphones. With only the floor plans as input, a demand for most applications, we apply a multimodal approach that joins inertial data, local magnetic field andWi-Fi signals to construct highly accurate fingerprints. Precise movement estimation is achieved regardless of smartphone usage through Deep Neural Networks, and the transition between floors detected from barometric data. Users’ trajectories obtained with Pedestrian Dead Reckoning techniques are partitioned into clusters with Wi-Fi measurements. Straight sections from the same cluster are then compared with subsequence Dynamic Time Warping to search for similarities. From the identified overlapping sections, a particle filter fits each trajectory into the building’s floor plans. From all successfully mapped routes, fingerprints labelled with physical locations are finally obtained. Experimental results from an office and a university building show that this solution constructs comparable fingerprints to those acquired manually, thus providing a useful tool for fingerprinting-based solutions automatic setup.publishersversionpublishe

    Blockchain Adoption and Investment Efficiency

    Get PDF
    This study empirically examines the relation between blockchain adoption and investment efficiency. Using a difference-in-differences research design with a sample of U.S. listed firms that indicate adoption of blockchain in business processes in 8-K filings during 2014 to 2019, we find that relative to non-adopters, blockchain adopters exhibit an increase in investment efficiency after the implementation of blockchain technology. Our findings suggest that blockchain adoption improves information quality which in turn affects firms’ price informativeness and information environments, and through which it enhances investment efficiency. Our study provides the first empirical evidence on the real effects of blockchain adoption. The findings are relevant to business communities given that improved efficiency is one of the main goals that many companies seek to achieve from adopting blockchain

    Trade-off among timeliness, messages and accuracy for large-Ssale information management

    Get PDF
    The increasing amount of data and the number of nodes in large-scale environments require new techniques for information management. Examples of such environments are the decentralized infrastructures of Computational Grid and Computational Cloud applications. These large-scale applications need different kinds of aggregated information such as resource monitoring, resource discovery or economic information. The challenge of providing timely and accurate information in large scale environments arise from the distribution of the information. Reasons for delays in distributed information system are a long information transmission time due to the distribution, churn and failures. A problem of large applications such as peer-to-peer (P2P) systems is the increasing retrieval time of the information due to the decentralization of the data and the failure proneness. However, many applications need a timely information provision. Another problem is an increasing network consumption when the application scales to millions of users and data. Using approximation techniques allows reducing the retrieval time and the network consumption. However, the usage of approximation techniques decreases the accuracy of the results. Thus, the remaining problem is to offer a trade-off in order to solve the conflicting requirements of fast information retrieval, accurate results and low messaging cost. Our goal is to reach a self-adaptive decision mechanism to offer a trade-off among the retrieval time, the network consumption and the accuracy of the result. Self-adaption enables distributed software to modify its behavior based on changes in the operating environment. In large-scale information systems that use hierarchical data aggregation, we apply self-adaptation to control the approximation used for the information retrieval and reduces the network consumption and the retrieval time. The hypothesis of the thesis is that approximation techniquescan reduce the retrieval time and the network consumption while guaranteeing an accuracy of the results, while considering user’s defined priorities. First, this presented research addresses the problem of a trade-off among a timely information retrieval, accurate results and low messaging cost by proposing a summarization algorithm for resource discovery in P2P-content networks. After identifying how summarization can improve the discovery process, we propose an algorithm which uses a precision-recall metric to compare the accuracy and to offer a user-driven trade-off. Second, we propose an algorithm that applies a self-adaptive decision making on each node. The decision is about the pruning of the query and returning the result instead of continuing the query. The pruning reduces the retrieval time and the network consumption at the cost of a lower accuracy in contrast to continuing the query. The algorithm uses an analytic hierarchy process to assess the user’s priorities and to propose a trade-off in order to satisfy the accuracy requirements with a low message cost and a short delay. A quantitative analysis evaluates our presented algorithms with a simulator, which is fed with real data of a network topology and the nodes’ attributes. The usage of a simulator instead of the prototype allows the evaluation in a large scale of several thousands of nodes. The algorithm for content summarization is evaluated with half a million of resources and with different query types. The selfadaptive algorithm is evaluated with a simulator of several thousands of nodes that are created from real data. A qualitative analysis addresses the integration of the simulator’s components in existing market frameworks for Computational Grid and Cloud applications. The proposed content summarization algorithm reduces the information retrieval time from a logarithmic increase to a constant factor. Furthermore, the message size is reduced significantly by applying the summarization technique. For the user, a precision-recall metric allows defining the relation between the retrieval time and the accuracy. The self-adaptive algorithm reduces the number of messages needed from an exponential increase to a constant factor. At the same time, the retrieval time is reduced to a constant factor under an increasing number of nodes. Finally, the algorithm delivers the data with the required accuracy adjusting the depth of the query according to the network conditions.La gestió de la informació exigeix noves tècniques que tractin amb la creixent quantitat de dades i nodes en entorns a gran escala. Alguns exemples d’aquests entorns són les infraestructures descentralitzades de Computacional Grid i Cloud. Les aplicacions a gran escala necessiten diferents classes d’informació agregada com monitorització de recursos i informació econòmica. El desafiament de proporcionar una provisió ràpida i acurada d’informació en ambients de grans escala sorgeix de la distribució de la informació. Una raó és que el sistema d’informació ha de tractar amb l’adaptabilitat i fracassos d’aquests ambients. Un problema amb aplicacions molt grans com en sistemes peer-to-peer (P2P) és el creixent temps de recuperació de l’informació a causa de la descentralització de les dades i la facilitat al fracàs. No obstant això, moltes aplicacions necessiten una provisió d’informació puntual. A més, alguns usuaris i aplicacions accepten inexactituds dels resultats si la informació es reparteix a temps. A més i més, el consum de xarxa creixent fa que sorgeixi un altre problema per l’escalabilitat del sistema. La utilització de tècniques d’aproximació permet reduir el temps de recuperació i el consum de xarxa. No obstant això, l’ús de tècniques d’aproximació disminueix la precisió dels resultats. Així, el problema restant és oferir un compromís per resoldre els requisits en conflicte d’extracció de la informació ràpida, resultats acurats i cost d’enviament baix. El nostre objectiu és obtenir un mecanisme de decisió completament autoadaptatiu per tal d’oferir el compromís entre temps de recuperació, consum de xarxa i precisió del resultat. Autoadaptacío permet al programari distribuït modificar el seu comportament en funció dels canvis a l’entorn d’operació. En sistemes d’informació de gran escala que utilitzen agregació de dades jeràrquica, l’auto-adaptació permet controlar l’aproximació utilitzada per a l’extracció de la informació i redueixen el consum de xarxa i el temps de recuperació. La hipòtesi principal d’aquesta tesi és que els tècniques d’aproximació permeten reduir el temps de recuperació i el consum de xarxa mentre es garanteix una precisió adequada definida per l’usari. La recerca que es presenta, introdueix un algoritme de sumarització de continguts per a la descoberta de recursos a xarxes de contingut P2P. Després d’identificar com sumarització pot millorar el procés de descoberta, proposem una mètrica que s’utilitza per comparar la precisió i oferir un compromís definit per l’usuari. Després, introduïm un algoritme nou que aplica l’auto-adaptació a un ordre per satisfer els requisits de precisió amb un cost de missatge baix i un retard curt. Basat en les prioritats d’usuari, l’algoritme troba automàticament un compromís. L’anàlisi quantitativa avalua els algoritmes presentats amb un simulador per permetre l’evacuació d’uns quants milers de nodes. El simulador s’alimenta amb dades d’una topologia de xarxa i uns atributs dels nodes reals. L’algoritme de sumarització de contingut s’avalua amb mig milió de recursos i amb diferents tipus de sol·licituds. L’anàlisi qualitativa avalua la integració del components del simulador en estructures de mercat existents per a aplicacions de Computacional Grid i Cloud. Així, la funcionalitat implementada del simulador (com el procés d’agregació i la query language) és comprovada per la integració de prototips. L’algoritme de sumarització de contingut proposat redueix el temps d’extracció de l’informació d’un augment logarítmic a un factor constant. A més, també permet que la mida del missatge es redueix significativament. Per a l’usuari, una precision-recall mètric permet definir la relació entre el nivell de precisió i el temps d’extracció de la informació. Alhora, el temps de recuperació es redueix a un factor constant sota un nombre creixent de nodes. Finalment, l’algoritme reparteix les dades amb la precisió exigida i ajusta la profunditat de la sol·licitud segons les condicions de xarxa. Els algoritmes introduïts són prometedors per ser utilitzats per l’agregació d’informació en nous sistemes de gestió de la informació de gran escala en el futur

    Massive Datasets in Astronomy

    Get PDF
    Astronomy has a long history of acquiring, systematizing, and interpreting large quantities of data. Starting from the earliest sky atlases through the first major photographic sky surveys of the 20th century, this tradition is continuing today, and at an ever increasing rate. Like many other fields, astronomy has become a very data-rich science, driven by the advances in telescope, detector, and computer technology. Numerous large digital sky surveys and archives already exist, with information content measured in multiple Terabytes, and even larger, multi-Petabyte data sets are on the horizon. Systematic observations of the sky, over a range of wavelengths, are becoming the primary source of astronomical data. Numerical simulations are also producing comparable volumes of information. Data mining promises to both make the scientific utilization of these data sets more effective and more complete, and to open completely new avenues of astronomical research. Technological problems range from the issues of database design and federation, to data mining and advanced visualization, leading to a new toolkit for astronomical research. This is similar to challenges encountered in other data-intensive fields today. These advances are now being organized through a concept of the Virtual Observatories, federations of data archives and services representing a new information infrastructure for astronomy of the 21st century. In this article, we provide an overview of some of the major datasets in astronomy, discuss different techniques used for archiving data, and conclude with a discussion of the future of massive datasets in astronomy.Comment: 46 Pages, 21 Figures, Invited Review for the Handbook of Massive Datasets, editors J. Abello, P. Pardalos, and M. Resende. Due to space limitations this version has low resolution figures. For full resolution review see http://www.astro.caltech.edu/~rb/publications/hmds.ps.g
    corecore