1,441 research outputs found

    Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

    Get PDF

    Botnet detection using ensemble classifiers of network flow

    Get PDF
    Recently, Botnets have become a common tool for implementing and transferring various malicious codes over the Internet. These codes can be used to execute many malicious activities including DDOS attack, send spam, click fraud, and steal data. Therefore, it is necessary to use Modern technologies to reduce this phenomenon and avoid them in advance in order to differentiate the Botnets traffic from normal network traffic. In this work, ensemble classifier algorithms to identify such damaging botnet traffic. We experimented with different ensemble algorithms to compare and analyze their ability to classify the botnet traffic from the normal traffic by selecting distinguishing features of the network traffic. Botnet Detection offers a reliable and cheap style for ensuring transferring integrity and warning the risks before its occurrence

    Loan Default Prediction: A Complete Revision of LendingClub

    Get PDF
    Predicción del default: Una revisión completa de LendingClub El objetivo del estudio es determinar un modelo de predicción de default crediticio usando la base de datos de LendingClub. La metodología consiste en estimar las variables que influyen en el proceso de predicción de préstamos pagados y no pagados utilizando el algoritmo Random Forest. El algoritmo define los factores con mayor influencia sobre el pago o el impago, generando un modelo reducido a nueve predictores relacionados con el historial crediticio del prestatario y el historial de pagos dentro de la plataforma. La medición del desempeño del modelo genera un resultado F1 Macro Score con una precisión mayor al 90% de la muestra de evaluación. Las contribuciones de este estudio incluyen, el haber utilizado la base de datos completa de toda la operación de LendingClub disponible, para obtener variables trascendentales para la tarea de clasificación y predicción, que pueden ser útiles para estimar la morosidad en el mercado de préstamos de persona a persona. Podemos sacar dos conclusiones importantes, primero confirmamos la capacidad del algoritmo Random Forest para predecir problemas de clasificación binaria en base a métricas de rendimiento obtenidas y segundo, denotamos la influencia de las variables tradicionales de puntuación de crédito en los problemas de predicción por defecto.The study aims to determine a credit default prediction model using data from LendingClub. The model estimates the effect of the influential variables on the prediction process of paid and unpaid loans. We implemented the random forest algorithm to identify the variables with the most significant influence on payment or default, addressing nine predictors related to the borrower's credit and payment background. Results confirm that the model’s performance generates a F1 Macro Score that accomplishes 90% in accuracy for the evaluation sample. Contributions of this study include using the complete dataset of the entire operation of LendingClub available, to obtain transcendental variables for the classification and prediction task, which can be helpful to estimate the default in the person-to-person loan market. We can draw two important conclusions, first we confirm the Random Forest algorithm's capacity to predict binary classification problems based on performance metrics obtained and second, we denote the influence of traditional credit scoring variables on default prediction problems

    Handling Uncertainty in Social Lending Credit Risk Prediction with a Choquet Fuzzy Integral Model

    Full text link
    As one of the main business models in the financial technology field, peer-to-peer (P2P) lending has disrupted traditional financial services by providing an online platform for lending money that has remarkably reduced financial costs. However, the inherent uncertainty in P2P loans can result in huge financial losses for P2P platforms. Therefore, accurate risk prediction is critical to the success of P2P lending platforms. Indeed, even a small improvement in credit risk prediction would be of benefit to P2P lending platforms. This paper proposes an innovative credit risk prediction framework that fuses base classifiers based on a Choquet fuzzy integral. Choquet integral fusion improves creditworthiness evaluations by synthesizing the prediction results of multiple classifiers and finding the largest consistency between outcomes among conflicting and consistent results. The proposed model was validated through experimental analysis on a real- world dataset from a well-known P2P lending marketplace. The empirical results indicate that the combination of multiple classifiers based on fuzzy Choquet integrals outperforms the best base classifiers used in credit risk prediction to date. In addition, the proposed methodology is superior to some conventional combination techniques

    Botnets and how to automatic detect them: exploring new ways of dealing with botnet classification: Botnets e como detectá-los automaticamente: explorando novas maneiras de lidar com a classificação botnet

    Get PDF
    Threats such as Botnets have become very popular in the current usage of the Internet, such as attacks like distributed denial of services (DoS) which can cause a significant impact on the use of technology. One way to mitigate such issues can be a focus on using intelligent models that can attempt to identify the existence of Botnets in the network traffic early. Thus, this work aims to evaluate the current state of the art on threats related to Botnets and how intelligent technology has been used in real-world restrictions such as real-time deadlines and increased network traffic. From our findings, we have indications that Botnet detection in real-time still is a more significant challenge because the computation power has not grown at the same rate that Internet traffic. This has pointed out other restrictions that must be considered, like privacy legislation and employing cryptography methods for all communications. In this context, we discuss the following steps to deal with the identified issues

    Primjena ansambl metoda, logističke regresije i neuronske mreže na mogućnost predviđanja Peer-to-Peer pozajmljivanja

    Get PDF
    Credit scoring has become an important issue because competition among financial institutions is intense and even a small improvement in predictive accuracy can result in significant savings. Financial institutions are looking for optimal strategies using credit scoring models. Therefore, credit scoring tools are extensively studied. As a result, various parametric statistical methods, non-parametric statistical tools and soft computing approaches have been developed to improve the accuracy of credit scoring models. In this paper, different approaches are used to classify customers into those who repay the loan and those who default on a loan. The purpose of this study is to investigate the performance of two credit scoring techniques, the logistic regression model estimated on categorized variables modified with the use of WOE (Weight of Evidence) transformation, and neural networks. We also combine multiple classifiers and test whether ensemble learning has better performance. To evaluate the feasibility and effectiveness of these methods, the analysis is performed on Lending Club data. In addition, we investigate Peer-to-peer lending, also called social lending. From the results, it can be concluded that the logistic regression model can provide better performance than neural networks. The proposed ensemble model (a combination of logistic regression and neural network by averaging the probabilities obtained from both models) has higher AUC, Gini coefficient and Kolmogorov-Smirnov statistics compared to other models. Therefore, we can conclude that the ensemble model allows to successfully reduce the potential risks of losses due to misclassification costs.Procjena kreditne sposobnosti postaje izuzetno važna s obzirom na sve intenzivniju konkurenciju među financijskim institucijama tako da čak i neznatno unapređivanje točnosti predviđanja može rezultirati značajnom uštedom. Financijske institucije traže optimalne strategije pomoću modela procjene kreditne sposobnosti. Stoga je proučavanje alata za procjenu kreditne sposobnosti široko rasprostranjeno. Kao rezultat toga, razvijene su različite parametarske statističke metode, ne-parametarski statistički alati i pristupi programskom računanju kako bi se povećala točnost modela procjene kreditne sposobnosti. U ovom radu primjenjuju se različiti pristupi za klasifikaciju kupaca, kao onih koji vraćaju zajam i onih koji ne mogu podmirivati svoje obveze. Svrha ove studije je istražiti uspješnost dviju tehnika vrednovanja kreditne sposobnosti, modela logističke regresije, procijenjene na temelju kategorizirane varijable modificirane pomoću WOE (Weight of Evidence) transformacije, i neuronskih mreža. Nadalje, istražuje se da li kombiniranje više klasifikatora i testiranje prikupljenih informacija ansambl metodom doprinosi boljim rezultatima. Da bi se procijenila izvedivost i učinkovitost ovih metoda, provodi se analiza podataka Lending Cluba. Istražuje se P2P pozajmljivanje, odnosno uzajamno pozajmljivanje bez posredovanja financijskih institucija, koje se još naziva i socijalno pozajmljivanje. Na temelju provedenog istraživanja, može se zaključiti da model logističke regresije daje bolje rezultate od neuronskih mreža. Izgleda da je predloženi ansambl model (kombinirajući logističku regresiju i neuronsku mrežu s prosjekom vjerojatnosti dobivenih iz oba modela) imao veću AUC krivulju, Gini koeficijent i Kolmogorov-Smirnov test veću statističku vrijednost u usporedbi s drugim modelima. Stoga možemo zaključiti da ansambl model omogućuje uspješno reduciranje mogućih rizika od gubitaka koji nastaju uslijed pogrešne klasifikacije troškova
    corecore