Search CORE

29 research outputs found

Phishing Detection Using Natural Language Processing and Machine Learning

Author: Chowdhury Taifur
Engels Dr Daniel
Kommanapalli Harsha
Mittal Apurv
Sivaraman Ravi
Publication venue: SMU Scholar
Publication date: 22/09/2022
Field of study

Phishing emails are a primary mode of entry for attackers into an organization. A successful phishing attempt leads to unauthorized access to sensitive information and systems. However, automatically identifying phishing emails is often difficult since many phishing emails have composite features such as body text and metadata that are nearly indistinguishable from valid emails. This paper presents a novel machine learning-based framework, the DARTH framework, that characterizes and combines multiple models, with one model for each composite feature, that enables the accurate identification of phishing emails. The framework analyses each composite feature independently utilizing a multi-faceted approach using Natural Language Processing (NLP) and neural network-based techniques and combines the results of these analyses to classify the emails as malicious or legitimate. Utilizing the framework on more than 150,000 emails and training data from multiple sources, including the authors’ emails and phishtank.com, resulted in the precision (correct identification of malicious observations to the total prediction of malicious observations) of 99.97% with an f-score of 99.98% and accurately identifying phishing emails 99.98% of the time. Utilizing multiple machine learning techniques combined in an ensemble approach across a range of composite features yields highly accurate identification of phishing emails

Southern Methodist University

Mahimahi: A Lightweight Toolkit for Reproducible Web Measurement

Author: Balakrishnan Hari
Das Somak
Goyal Ameesh
Netravali Ravi Arun
Sivaraman Kaushalram Anirudh
Winstein Keith
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2014
Field of study

This demo presents a measurement toolkit, Mahimahi, that records websites and replays them under emulated network conditions. Mahimahi is structured as a set of arbitrarily composable UNIX shells. It includes two shells to record and replay Web pages, RecordShell and ReplayShell, as well as two shells for network emulation, DelayShell and LinkShell. In addition, Mahimahi includes a corpus of recorded websites along with benchmark results and link traces (https://github.com/ravinet/sites). Mahimahi improves on prior record-and-replay frameworks in three ways. First, it preserves the multi-origin nature of Web pages, present in approximately 98% of the Alexa U.S. Top 500, when replaying. Second, Mahimahi isolates its own network traffic, allowing multiple instances to run concurrently with no impact on the host machine and collected measurements. Finally, Mahimahi is not inherently tied to browsers and can be used to evaluate many different applications. A demo of Mahimahi recording and replaying a Web page over an emulated link can be found at http://youtu.be/vytwDKBA-8s. The source code and instructions to use Mahimahi are available at http://mahimahi.mit.edu/

DSpace@MIT

Crossref

Predictors of mortality among hospitalized COVID-19 patients and risk score formulation for prioritizing tertiary care—An experience from South India

BACKGROUND: We retrospectively data-mined the case records of Reverse Transcription Polymerase Chain Reaction (RT-PCR) confirmed COVID-19 patients hospitalized to a tertiary care centre to derive mortality predictors and formulate a risk score, for prioritizing admission. METHODS AND FINDINGS: Data on clinical manifestations, comorbidities, vital signs, and basic lab investigations collected as part of routine medical management at admission to a COVID-19 tertiary care centre in Chengalpattu, South India between May and November 2020 were retrospectively analysed to ascertain predictors of mortality in the univariate analysis using their relative difference in distribution among ‘survivors’ and ‘non-survivors’. The regression coefficients of those factors remaining significant in the multivariable logistic regression were utilised for risk score formulation and validated in 1000 bootstrap datasets. Among 746 COVID-19 patients hospitalised [487 “survivors” and 259 “non-survivors” (deaths)], there was a slight male predilection [62.5%, (466/746)], with a higher mortality rate observed among 40–70 years age group [59.1%, (441/746)] and highest among diabetic patients with elevated urea levels [65.4% (68/104)]. The adjusted odds ratios of factors [OR (95% CI)] significant in the multivariable logistic regression were SaO(2)3; 3.01 (1.61–5.83), Age ≥50 years;2.52 (1.45–4.43), Pulse Rate ≥100/min: 2.02 (1.19–3.47) and coexisting Diabetes Mellitus; 1.73 (1.02–2.95) with hypertension and gender not retaining their significance. The individual risk scores for SaO(2)3–11, Age ≥50 years-9, Pulse Rate ≥100/min-7 and coexisting diabetes mellitus-6, acronymed collectively as ‘OUR-ARDs score’ showed that the sum of scores ≥ 25 predicted mortality with a sensitivity-90%, specificity-64% and AUC of 0.85. CONCLUSIONS: The ‘OUR ARDs’ risk score, derived from easily assessable factors predicting mortality, offered a tangible solution for prioritizing admission to COVID-19 tertiary care centre, that enhanced patient care but without unduly straining the health system

NIRT Institutional Repository

PubMed Central

The structure of the cysteine protease and lectin-like domains of Cwp84, a surface layer-associated protein from Clostridium difficile

Author: Abigail H. Davies
Abrahams
Adams
April K. Roberts
Beton
Bhogaraju
Bond
Calabi
Cerquetti
ChapetónMontes
Chen
Christopher J. Chambers
Clifford C. Shone
Coulombe
Cowtan
Cuneo
Dahl
Dang
Diederichs
Dubberke
Emsley
Evans
Fagan
Fagan
Ferner-Ortner
Ficko-Blean
Graaff
Guarner
Harding
Holm
Howell
Janoir
Janoir
Jonathan M. Kirby
K. Ravi Acharya
Kabsch
Kabsch
Kachrimanidou
Karjalainen
Kerr
Kirby
Larkin
Larson
Monot
Montanier
Murshudov
Ness
Nethaji Thiyagarajan
Newstead
Nägler
Pannu
Pannu
Pruitt
Rawlings
Reynolds
Riva
Rupnik
Sajid
Sampathkumar
Savariau-Lacomme
Schechter
Sebaihia
Shen
Sivaraman
Sára
Sára
Turk
Waligora
Wiggers
William J. Bradshaw
Winn
Winter
Wright
Zheng
Publication venue: 'International Union of Crystallography (IUCr)'
Publication date
Field of study

Crossref

WatchTower: Fast, Secure Mobile Page Loads Using Remote Dependency Resolution

Author: Balakrishnan Hari
Balakrishnan Hari
Mickens James
Netravali Ravi
Sivaraman Anirudh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/11/2020
Field of study

Remote dependency resolution (RDR) is a proxy-driven scheme for reducing mobile page load times; a proxy loads a requested page using a local browser, fetching the page’s resources over fast proxy-origin links instead of a client’s slow last-mile links. In this paper, we describe two fundamental challenges to efficient RDR proxying: the increasing popularity of encrypted HTTPS content, and the fact that, due to time-dependent network conditions and page properties, RDR proxying can actually increase load times. We solve these problems by introducing a new, secure proxying scheme for HTTPS traffic, and by implementing WatchTower, a selective proxying system that uses dynamic models of network conditions and page structures to only enable RDR when it is predicted to help. WatchTower loads pages 21.2%–41.3% faster than state-of-the-art proxies and server push systems, while preserving end-to-end HTTPS security.NSF (Grant CNS-1407470

DSpace@MIT

WiFi, LTE, or Both?

Author: Balakrishnan Hari
Deng Shuo
Netravali Ravi Arun
Sivaraman Kaushalram Anirudh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2014
Field of study

Over the past two or three years, wireless cellular networks have become faster than before, most notably due to the deployment of LTE, HSPA+, and other similar networks. LTE throughputs can reach many megabits per second and can even rival WiFi throughputs in some locations. This paper addresses a fundamental question confronting transport and application-layer protocol designers: which network should an application use? WiFi, LTE, or Multi-Path TCP (MPTCP) running over both? We compare LTE and WiFi for transfers of different sizes along both directions (i.e. the uplink and the downlink) using a crowd-sourced mobile application run by 750 users over 180 days in 16 different countries. We find that LTE outperforms WiFi 40\% of the time, which is a higher fraction than one might expect at first sight. We measure flow-level MPTCP performance and compare it with the performance of TCP running over exclusively WiFi or LTE in 20 different locations across 7 cities in the United States. For short flows, we find that MPTCP performs worse than regular TCP running over the faster link; further, selecting the correct network for the primary subflow in MPTCP is critical in achieving good performance. For long flows, however, selecting the proper MPTCP congestion control algorithm is equally important. To complement our flow-level analysis, we analyze the traffic patterns of several mobile apps, finding that apps can be categorized as "short-flow dominated" or "long-flow dominated". We then record and replay these patterns over emulated WiFi and LTE links. We find that application performance has a similar dependence on the choice of networks as flow-level performance: an application dominated by short flows sees little gain from MPTCP, while an application with longer flows can benefit much more from MPTCP --- if the application can pick the right network for the primary subflow and the right choice of MPTCP congestion control.National Science Foundation (U.S.) (Grant 1407470)National Science Foundation (U.S.) (Grant 1161964

Crossref

DSpace@MIT