2,928 research outputs found

    CAPS: A Practical Partition Index for Filtered Similarity Search

    Full text link
    With the surging popularity of approximate near-neighbor search (ANNS), driven by advances in neural representation learning, the ability to serve queries accompanied by a set of constraints has become an area of intense interest. While the community has recently proposed several algorithms for constrained ANNS, almost all of these methods focus on integration with graph-based indexes, the predominant class of algorithms achieving state-of-the-art performance in latency-recall tradeoffs. In this work, we take a different approach and focus on developing a constrained ANNS algorithm via space partitioning as opposed to graphs. To that end, we introduce Constrained Approximate Partitioned Search (CAPS), an index for ANNS with filters via space partitions that not only retains the benefits of a partition-based algorithm but also outperforms state-of-the-art graph-based constrained search techniques in recall-latency tradeoffs, with only 10% of the index size.Comment: 14 page

    Machine learning approach for credit score analysis : a case study of predicting mortgage loan defaults

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the degree of Statistics and Information Management specialized in Risk Analysis and ManagementTo effectively manage credit score analysis, financial institutions instigated techniques and models that are mainly designed for the purpose of improving the process assessing creditworthiness during the credit evaluation process. The foremost objective is to discriminate their clients – borrowers – to fall either in the non-defaulter group, that is more likely to pay their financial obligations, or the defaulter one which has a higher probability of failing to pay their debts. In this paper, we devote to use machine learning models in the prediction of mortgage defaults. This study employs various single classification machine learning methodologies including Logistic Regression, Classification and Regression Trees, Random Forest, K-Nearest Neighbors, and Support Vector Machine. To further improve the predictive power, a meta-algorithm ensemble approach – stacking – will be introduced to combine the outputs – probabilities – of the afore mentioned methods. The sample for this study is solely based on the publicly provided dataset by Freddie Mac. By modelling this approach, we achieve an improvement in the model predictability performance. We then compare the performance of each model, and the meta-learner, by plotting the ROC Curve and computing the AUC rate. This study is an extension of various preceding studies that used different techniques to further enhance the model predictivity. Finally, our results are compared with work from different authors.Para gerir com eficácia a análise de risco de crédito, as instituições financeiras desenvolveram técnicas e modelos que foram projetados principalmente para melhorar o processo de avaliação da qualidade de crédito durante o processo de avaliação de crédito. O objetivo final é classifica os seus clientes - tomadores de empréstimos - entre aqueles que tem maior probabilidade de pagar suas obrigações financeiras, e os potenciais incumpridores que têm maior probabilidade de entrar em default. Neste artigo, nos dedicamos a usar modelos de aprendizado de máquina na previsão de defaults de hipoteca. Este estudo emprega várias metodologias de aprendizado de máquina de classificação única, incluindo Regressão Logística, Classification and Regression Trees, Random Forest, K-Nearest Neighbors, and Support Vector Machine. Para melhorar ainda mais o poder preditivo, a abordagem do conjunto de meta-algoritmos - stacking - será introduzida para combinar as saídas - probabilidades - dos métodos acima mencionados. A amostra deste estudo é baseada exclusivamente no conjunto de dados fornecido publicamente pela Freddie Mac. Ao modelar essa abordagem, alcançamos uma melhoria no desempenho do modelo de previsibilidade. Em seguida, comparamos o desempenho de cada modelo e o meta-aprendiz, plotando a Curva ROC e calculando a taxa de AUC. Este estudo é uma extensão de vários estudos anteriores que usaram diferentes técnicas para melhorar ainda mais o modelo preditivo. Finalmente, nossos resultados são comparados com trabalhos de diferentes autores

    Power-Aware Planning and Design for Next Generation Wireless Networks

    Get PDF
    Mobile network operators have witnessed a transition from being voice dominated to video/data domination, which leads to a dramatic traffic growth over the past decade. With the 4G wireless communication systems being deployed in the world most recently, the fifth generation (5G) mobile and wireless communica- tion technologies are emerging into research fields. The fast growing data traffic volume and dramatic expansion of network infrastructures will inevitably trigger tremendous escalation of energy consumption in wireless networks, which will re- sult in the increase of greenhouse gas emission and pose ever increasing urgency on the environmental protection and sustainable network development. Thus, energy-efficiency is one of the most important rules that 5G network planning and design should follow. This dissertation presents power-aware planning and design for next generation wireless networks. We study network planning and design problems in both offline planning and online resource allocation. We propose approximation algo- rithms and effective heuristics for various network design scenarios, with different wireless network setups and different power saving optimization objectives. We aim to save power consumption on both base stations (BSs) and user equipments (UEs) by leveraging wireless relay placement, small cell deployment, device-to- device communications and base station consolidation. We first study a joint signal-aware relay station placement and power alloca- tion problem with consideration for multiple related physical constraints such as channel capacity, signal to noise ratio requirement of subscribers, relay power and network topology in multihop wireless relay networks. We present approximation schemes which first find a minimum number of relay stations, using maximum transmit power, to cover all the subscribers meeting each SNR requirement, and then ensure communications between any subscriber and a base station by ad- justing the transmit power of each relay station. In order to save power on BS, we propose a practical solution and offer a new perspective on implementing green wireless networks by embracing small cell networks. Many existing works have proposed to schedule base station into sleep to save energy. However, in reality, it is very difficult to shut down and reboot BSs frequently due to nu- merous technical issues and performance requirements. Instead of putting BSs into sleep, we tactically reduce the coverage of each base station, and strategi- cally place microcells to offload the traffic transmitted to/from BSs to save total power consumption. In online resource allocation, we aim to save tranmit power of UEs by en- abling device-to-device (D2D) communications in OFDMA-based wireless net- works. Most existing works on D2D communications either targeted CDMA- based single-channel networks or aimed at maximizing network throughput. We formally define an optimization problem based on a practical link data rate model, whose objective is to minimize total power consumption while meeting user data rate requirements. We propose to solve it using a joint optimization approach by presenting two effective and efficient algorithms, which both jointly determine mode selection, channel allocation and power assignment. In the last part of this dissertation, we propose to leverage load migration and base station consolidation for green communications and consider a power- efficient network planning problem in virtualized cognitive radio networks with the objective of minimizing total power consumption while meeting traffic load demand of each Mobile Virtual Network Operator (MVNO). First we present a Mixed Integer Linear Programming (MILP) to provide optimal solutions. Then we present a general optimization framework to guide algorithm design, which solves two subproblems, channel assignment and load allocation, in sequence. In addition, we present an effective heuristic algorithm that jointly solves the two subproblems. Numerical results are presented to confirm the theoretical analysis of our schemes, and to show strong performances of our solutions, compared to several baseline methods
    • …
    corecore