353 research outputs found

    Information retrieval in the Web: beyond current search engines

    Get PDF
    AbstractIn this paper we briefly explore the challenges to expand information retrieval (IR) on the Web, in particular other types of data, Web mining and issues related to crawling. We also mention the main relations of IR and soft computing and how these techniques address these challenges

    A feasibility study of an in-the-wild experimental public access WiFi network

    Get PDF
    Universal Internet access has become critical to modern life, leading to many explorations of approaches to increase its availability. In this paper we report on a study of one such approach, PAWS, that seeks to understand the technical and social constraints of providing Internet access, free at the point of use, by sharing existing broadband subscribers' connections. We elaborate the technical and social context of our deployment, a deprived neighbourhood in a medium-sized British city, and discuss the constraints on and resulting architecture of this system, including the authentication and security mechanisms necessary for a service of this kind. We then report on the use of our deployment over a period of seven months from July 2013 to February 2014, including analyses of the performance and usage of the network. Our data show that PAWS is socially and technically feasible and has the potential to provide Internet access economically to many who are currently digitally disenfranchised. However, doing so requires overcoming numerous challenges, both technical and socia

    Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings

    Full text link
    We introduce a novel suite of state-of-the-art bilingual text embedding models that are designed to support English and another target language. These models are capable of processing lengthy text inputs with up to 8192 tokens, making them highly versatile for a range of natural language processing tasks such as text retrieval, clustering, and semantic textual similarity (STS) calculations. By focusing on bilingual models and introducing a unique multi-task learning objective, we have significantly improved the model performance on STS tasks, which outperforms the capabilities of existing multilingual models in both target language understanding and cross-lingual evaluation tasks. Moreover, our bilingual models are more efficient, requiring fewer parameters and less memory due to their smaller vocabulary needs. Furthermore, we have expanded the Massive Text Embedding Benchmark (MTEB) to include benchmarks for German and Spanish embedding models. This integration aims to stimulate further research and advancement in text embedding technologies for these languages

    Database server workload characterization in an e-commerce environment

    Get PDF
    A typical E-commerce system that is deployed on the Internet has multiple layers that include Web users, Web servers, application servers, and a database server. As the system use and user request frequency increase, Web/application servers can be scaled up by replication. A load balancing proxy can be used to route user requests to individual machines that perform the same functionality. To address the increasing workload while avoiding replicating the database server, various dynamic caching policies have been proposed to reduce the database workload in E-commerce systems. However, the nature of the changes seen by the database server as a result of dynamic caching remains unknown. A good understanding of this change is fundamental for tuning a database server to get better performance. In this study, the TPC-W (a transactional Web E-commerce benchmark) workloads on a database server are characterized under two different dynamic caching mechanisms, which are generalized and implemented as query-result cache and table cache. The characterization focuses on response time, CPU computation, buffer pool references, disk I/O references, and workload classification. This thesis combines a variety of analysis techniques: simulation, real time measurement and data mining. The experimental results in this thesis reveal some interesting effects that the dynamic caching has on the database server workload characteristics. The main observations include: (a) dynamic cache can considerably reduce the CPU usage of the database server and the number of database page references when it is heavily loaded; (b) dynamic cache can also reduce the database reference locality, but to a smaller degree than that reported in file servers. The data classification results in this thesis show that with dynamic cache, the database server sees TPC-W profiles more like on-line transaction processing workloads

    A decentralized service discovery approach on peer-to-peer network

    Get PDF
    Service-Oriented Computing (SOC) is emerging as a paradigm for developing distributed applications. A critical issue of utilizing SOC is to have a scalable, reliable, and robust service discovery mechanism. However, traditional service discovery methods using centralized registries can easily suffer from problems such as performance bottleneck and vulnerability to failures in large scalable service networks, thus functioning abnormally. To address these problems, this paper proposes a peer-to-peer-based decentralized service discovery approach named Chord4S. Chord4S utilizes the data distribution and lookup capabilities of the popular Chord to distribute and discover services in a decentralized manner. Data availability is further improved by distributing published descriptions of functionally equivalent services to different successor nodes that are organized into virtual segments in the Chord4S circle. Based on the service publication approach, Chord4S supports QoS-aware service discovery. Chord4S also supports service discovery with wildcard(s). In addition, the Chord routing protocol is extended to support efficient discovery of multiple services with a single query. This enables late negotiation of Service Level Agreements (SLAs) between service consumers and multiple candidate service providers. The experimental evaluation shows that Chord4S achieves higher data availability and provides efficient query with reasonable overhead

    A Conceptual Framework for Adapation

    Get PDF
    This paper presents a white-box conceptual framework for adaptation that promotes a neat separation of the adaptation logic from the application logic through a clear identification of control data and their role in the adaptation logic. The framework provides an original perspective from which we survey archetypal approaches to (self-)adaptation ranging from programming languages and paradigms, to computational models, to engineering solutions
    corecore