33 research outputs found

    A Cooperative Coevolution Framework for Parallel Learning to Rank

    Get PDF
    Abstract—We propose CCRank, the first parallel framework for learning to rank based on evolutionary algorithms (EA), aiming to significantly improve learning efficiency while maintaining accuracy. CCRank is based on cooperative coevolution (CC), a divide-and-conquer framework that has demonstrated high promise in function optimization for problems with large search space and complex structures. Moreover, CC naturally allows parallelization of sub-solutions to the decomposed sub-problems, which can substantially boost learning efficiency. With CCRank, we investigate parallel CC in the context of learning to rank. We implement CCRank with three EA-based learning to rank algorithms for demonstration. Extensive experiments on benchmark datasets in comparison with the state-of-the-art algorithms show the performance gains of CCRank in efficiency and accuracy. Index Terms—Cooperative coevolution, learning to rank, information retrieval, genetic programming, immune programming Ç

    Machine learning approach for personalized recommendations on online platforms: uniplaces case study

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThe goal of this project is to develop a model to personalize the user recommendations of an online marketplace named Uniplaces. This online business offers properties for medium and long-term stays, where landlords can directly rent their place to customers (mainly students). Whenever a student makes a reservation, the booking must be approved by the property owner. The current acceptance rate is 25%. The model is a response to this low acceptance rate, and it will have to show to each student the properties that are more likely to be accepted by the landlord. As a secondary objective, the model seeks to identify the reasons behind the landlord’s decision to accept or reject bookings. The model will be constructed using information from the users, landlord and the property itself kindly provided by Uniplaces. This information will pre-process with data cleaning, transformation and features reduction (where two techniques were applied: dimensionality reduction, features selection). After the data processing, several models were applied to the normalized data. The predictive models that will be applied are already being used on other online markets and platforms like Airbnb, Netflix or LinkedIn, namely Support Vector Machine, Neural Networks, Decision Tree, Logistic Regression and Gradient Boosting. The probability of acceptance proved to be very easy to predict, all the models predict 100% of the test dataset when using the Principal Component Analysis as the Dimensionality Reduction technique. This can be explained mainly by the fact that the new calculated features have a strong correlation with the target variable. All the algorithms predict 100% of the target variable when using Principal Component Analysis as a technique of dimensionality reduction

    Text signals relevance improvement for full text serch

    Get PDF
    Ačkoliv se vyhledávání informací na webu stalo standardem a často oblíbenym zdrojem pro hledání informací již před mnoha lety, úloha hledání relevance dokumentů k danému uživatelskému dotazu má stále mnoho slabych míst, které je zapotřebí zlepšit. Tato práce se snaží nalézt takové textové příznaky, které by zlepšily vysledky full-textového vyhledávání, a tím i spokojenost uživatelů, za využití datasetů od společnosti Seznam.cz. Za prvé jsou v rámci této diplomové práce analyzovány hlavní LTR algoritmy, evaluační míry a běžně používané textové signály známé z literatury. Za druhé byl navržen a naimplementován systém pro testování a evaluaci nově přidanych textovych signálů a nakonec byly tyto nově přidané signály porovnány s anonymizovanymi signály, které v současnosti používá Seznam.cz, prostřednictvím velké sady experimentů.Although web search has become a standard and often favored source of information finding many years ago, the task of searching relevance documents to given user query has still a lot of weak spaces need to be improved. This thesis is trying to find new text relevance signals to improve full-text search and user satisfaction via datasets provided by Seznam.cz. First of all, there is analyzed and evaluated major LTR algorithms, evaluation metrics and commonly used text signals known from literature. Second, system for testing and evaluation of new signals was designed and implemented and finally bunch of experiments over the new text signals were conducted and results were compared with anonymized baseline signals provided by Seznam.cz

    Acceleration of ListNet for ranking using reconfigurable architecture

    Get PDF
    Document ranking is used to order query results by relevance with ranking models. ListNet is a well-known ranking approach for constructing and training learning-to-rank models. Compared with traditional learning approaches, ListNet delivers better accuracy, but is computationally too expensive to learn models with large data sets due to the large number of permutations and documents involved in computing the gradients. Currently, the long training time limits the practicality of ListNet in ranking applications such as breaking news search and stock prediction, and this situation is getting worse with the increase in data-set size. In order to tackle the challenge of long training time, this thesis optimises the ListNet algorithm, and designs hardware accelerators for learning the ListNet algorithm using Field Programmable Gate Arrays (FPGAs), making the algorithm more practical for real-world application. The contributions of this thesis include: 1) A novel computation method of the ListNet algorithm for ranking. The proposed computation method exposes more fine-grained parallelism for FPGA implementation. 2) A weighted sampling method that takes into account the ranking positions, along with an effective quantisation method based on FPGA devices. The proposed design achieves a 4.42x improvement over GPU implementation speed, while still guaranteeing the accuracy. 3) A full reconfigurable architecture for the ListNet training using multiple bitstream kernels. The proposed method achieves a higher model accuracy than pure fixed point training, and a better throughput than pure floating point training. This thesis has resulted in the acceleration of the ListNet algorithm for ranking using FPGAs by applying the above techniques. Significant improvements in speed have been achieved in this work against CPU and GPU implementations.Open Acces
    corecore