7 research outputs found

    Improvement of Information Retrieval Systems by Using Hidden Vertical Search

    Get PDF
    The exponential growth of the number of documents in digital libraries and on the Web calls for very intensive development of retrieval systems. One possible architectural approach to IRS, an architecture with hidden verticals, is proposed in this paper. In IRS with hidden verticals, documents from the searched corpus are stored into a predefined set of classes. The user's query is classified before the search, and searching is done only within the corresponding class. The performance of the proposed system is compared to the performance of standard IRS (that contains a unique inverted index) and IRS with cluster pruning (in which searching corpus is clustered and query is compared to the clusters' centroids first, then search is done only in the most similar cluster). Search time in the proposed system is 7.9 times shorter than in the standard IRS and 1.7 times shorter than in the system with cluster pruning. The precision of the proposed system is 2.59 times higher than the precision of the standard IRS, and 1.68 times better compared to the IRS with cluster pruning. The recall of the proposed system is 1.09 times smaller than the recall of the standard IRS, but it is 1.28 times better than the recall of the IRS with cluster pruning. Based on the above results, we can say that proposed approach reduces search time and increases search precision with a minimal reduction in recall

    Efficient Communication in Agent-based Autonomous Logistic Processes

    Get PDF
    Transportation of goods plays a vital role for the success of a logistics network. The ability to transport goods quickly and cost effectively is one of the major requirements of the customers. Dynamics involved in the logistics process like change or cancellation of orders or uncertain information about the orders add to the complexity of the logistic network and can even reduce the efficiency of the entire logistics process. This brings about a need of integrating technology and making the system more autonomous to handle these dynamics and to reduce the complexity. Therefore, the distributed logistics routing protocol (DLRP) was developed at the University of Bremen. In this thesis, DLRP is extended with the concept of clustering of transport goods, two novel routing decision schemes and a negotiation process between the cluster of goods and the vehicle. DLRP provides the individual logistic entities the ability to perform routing tasks autonomously e.g., discovering the best route to the destination at the given time. Even though DLRP seems to solve the routing problem in real-time, the amount of message flooding involved in the route discovery process is enormous. This motivated the author to introduce a cluster-based routing approach using software agents. The DLRP along with the clustering algorithm is termed as the cluster-based DLRP. In the latter, the goods are first clustered into groups based on criteria such as the common destination. The routing is now handled by the cluster head rather than the individual transport goods which results in a reduced communication volume in the route discovery. The latter is proven by evaluating the performance of the cluster-based DLRP approach compared to the legacy DLRP. After the routing process is completed by the cluster heads, the next step is to improve the transport performance in the logistics network by identifying the best means to transport the clustered goods. For example, to have better utilization of the transport capacity, clusters can be transported together on a stretch of overlapping route. In order to make optimal transport decisions, the vehicle calculates the correlation metric of the routes selected by the various clusters. The correlation metric aids in identifying the clusters which can be transported together and thereby can result in better utilization of the transport resources. In turn, the transportation cost that has to be paid to the vehicle can be shared between the different clusters. The transportation cost for a stretch of route is calculated by the vehicle and offered to the cluster. The latter can decide based upon the transportation cost or the selected route whether to accept the transport offer from the vehicle or not. In this regard, different strategies are developed and investigated. Thereby a performance evaluation of the capacity utilization of the vehicle and the transportation cost incurred by the cluster is presented. Finally, the thesis introduces the concept of negotiation in the cluster based routing methods. The negotiation process enhances the transport decisions by giving the clusters and the vehicles the flexibility to negotiate the transportation cost. Thus, the focus of this part of the thesis is to analyse the negotiation strategies used by the logistics entities and their role in saving negotiation time while achieving a favorable transportation cost. In this regard, a performance evaluation of the different proposed strategies is presented, which in turn gives the logistics practitioners an overview of the best strategy to be deployed in various scenarios. Clustering of goods aid in the negotiation process as on the one hand, a group of transport goods have a stronger basis for negotiation to achieve a favorable transportation price from the vehicle. On the other hand it makes it easier for the vehicle to select the packages for transport and helps the vehicle to operate close to its capacity. In addition, clustering enables the negotiation process to be less complex and voluminous. From the analytical considerations and obtained results in the three parts of this thesis, it can be concluded that efficient transport decisions, though very complex in a logistics network, can be simplified to a certain extent utilizing the available information of the goods and vehicles in the network

    Effective retrieval to support learning

    Get PDF
    To use digital resources to support learning, we need to be able to retrieve them. This thesis introduces a new area of research within information retrieval, the retrieval of educational resources from the Web. Successful retrieval of educational resources requires an understanding of how the resources being searched are managed, how searchers interact with those resources and the systems that manage them, and the needs of the people searching. As such, we began by investigating how resources are managed and reused in a higher education setting. This investigation involved running four focus groups with 23 participants, 26 interviews and a survey. The second part of this work is motivated by one of our initial findings; when people look for educational resources, they prefer to search the World Wide Web using a public search engine. This finding suggests users searching for educational resources may be more satisfied with search engine results if only those resources likely to support learning are presented. To provide satisfactory result sets, resources that are unlikely to support learning should not be present. A filter to detect material that is likely to support learning would therefore be useful. Information retrieval systems are often evaluated using the Cranfield method, which compares system performance with a ground truth provided by human judgments. We propose a method of evaluating systems that filter educational resources based on this method. By demonstrating that judges can agree on which resources are educational, we establish that a single human judge for each resource provides a sufficient ground truth. Machine learning techniques are commonly used to classify resources. We investigate how machine learning can be used to classify resources retrieved from the Web as likely or unlikely to support learning. We found that reasonable classification performance can be achieved using text extracted from resources in conjunction with Naïve Bayes, AdaBoost, and Random Forest classifiers. We also found that attributes developed from the structural elements—hyperlinks and headings found in a resource—did not substantially improve classification to support learning. We found that reasonable classification performance can be achieved using text extracted from resources in conjunction with Naïve Bayes, AdaBoost, and Random Forest classifiers. We also found that attributes developed from the structural elements—hyperlinks and headings found in a resource—did not substantially improve classification over simply using the text

    The Effectiveness of Query-Based Hierarchic Clustering of Documents for Information Retrieval

    Get PDF
    Hierarchic document clustering has been applied to Information Retrieval (IR) for over three decades. Its introduction to IR was based on the grounds of its potential to improve the effectiveness of IR systems. Central to the issue of improved effectiveness is the Cluster Hypothesis. The hypothesis states that relevant documents tend to be highly similar to each other, and therefore tend to appear in the same clusters. However, previous research has been inconclusive as to whether document clustering does bring improvements. The main motivation for this work has been to investigate methods for the improvement of the effectiveness of document clustering, by challenging some assumptions that implicitly characterise its application. Such assumptions relate to the static manner in which document clustering is typically performed, and include the static application of document clustering prior to querying, and the static calculation of interdocument associations. The type of clustering that is investigated in this thesis is query-based, that is, it incorporates information from the query into the process of generating clusters of documents. Two approaches for incorporating query information into the clustering process are examined: clustering documents which are returned from an IR system in response to a user query (post-retrieval clustering), and clustering documents by using query-sensitive similarity measures. For the first approach, post-retrieval clustering, an analytical investigation into a number of issues that relate to its retrieval effectiveness is presented in this thesis. This is in contrast to most of the research which has employed post-retrieval clustering in the past, where it is mainly viewed as a convenient and efficient means of presenting documents to users. In this thesis, post-retrieval clustering is employed based on its potential to introduce effectiveness improvements compared both to static clustering and best-match IR systems. The motivation for the second approach, the use of query-sensitive measures, stems from the role of interdocument similarities for the validity of the cluster hypothesis. In this thesis, an axiomatic view of the hypothesis is proposed, by suggesting that documents relevant to the same query (co-relevant documents) display an inherent similarity to each other which is dictated by the query itself. Because of this inherent similarity, the cluster hypothesis should be valid for any document collection. Past research has attributed failure to validate the hypothesis for a document collection to characteristics of the collection. Contrary to this, the view proposed in this thesis suggests that failure of a document set to adhere to the hypothesis is attributed to the assumptions made about interdocument similarity. This thesis argues that the query determines the context and the purpose for which the similarity between documents is judged, and it should therefore be incorporated in the similarity calculations. By taking the query into account when calculating interdocument similarities, co-relevant documents can be "forced" to be more similar to each other. This view challenges the typically static nature of interdocument relationships in IR. Specific formulas for the calculation of query-sensitive similarity are proposed in this thesis. Four hierarchic clustering methods and six document collections are used in the experiments. Three main issues are investigated: the effectiveness of hierarchic post-retrieval clustering which uses static similarity measures, the effectiveness of query-sensitive measures at increasing the similarity of pairs of co-relevant documents, and the effectiveness of hierarchic clustering which uses query-sensitive similarity measures. The results demonstrate the effectiveness improvements that are introduced by the use of both approaches of query-based clustering, compared both to the effectiveness of static clustering and to the effectiveness of best-match IR systems. Query-sensitive similarity measures, in particular, introduce significant improvements over the use of static similarity measures for document clustering, and they also significantly improve the structure of the document space in terms of the similarity of pairs of co-relevant documents. The results provide evidence for the effectiveness of hierarchic query-based clustering of documents, and also challenge findings of previous research which had dismissed the potential of hierarchic document clustering as an effective method for information retrieval

    Clustering information retrieval search outputs

    No full text
    Users are known to experience difficulties in dealing with information retrieval search outputs, especially if those outputs are above a certain size. It has been argued by several researchers that search output clustering can help users in their interaction with IR systems in some retrieval situations, providing them with an overview of their results by exploiting the topicality information that resides in the output but has not been used at the retrieval stage. This overview might enable them to find relevant documents more easily by focusing on the most promising clusters, or to use the clusters as a starting-point for query refinement or expansion. In this paper, the results of experiments carried out to assess the viability of clustering as a search output presentation method are reported and discussed. 1

    Clustering information retrieval search outputs

    No full text
    SIGLEAvailable from British Library Document Supply Centre-DSC:DXN034248 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
    corecore