Search CORE

95 research outputs found

Unexpectedness as a Measure of Interestingness in Knowledge Discovery

Author: Padmanabhan Balaji
Tuzhilin Alexander
Publication venue: Stern School of Business, New York University
Publication date: 01/01/1997
Field of study

Organizations are taking advantage of "data-mining" techniques to leverage the vast amounts of data captured as they process routine transactions. Data-mining is the process of discovering hidden structure or patterns in data. However several of the pattern discovery methods in datamining systems have the drawbacks that they discover too many obvious or irrelevant patterns and that they do not leverage to a full extent valuable prior domain knowledge that managers have. This research addresses these drawbacks by developing ways to generate interesting patterns by incorporating managers' prior knowledge in the process of searching for patterns in data. Specifically we focus on providing methods that generate unexpected patterns with respect to managerial intuition by eliciting managers' beliefs about the domain and using these beliefs to seed the search for unexpected patterns in data. Our approach should lead to the development of decision support systems that provide managers with more relevant patterns from data and aid in effective decision making.Information Systems Working Papers Serie

New York University Faculty Digital Archive

Pattern-Oriented Clustering of Web Transactions

Author: Padmanabhan Balaji
Yang Yinghui
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2003
Field of study

AIS Electronic Library (AISeL)

News Recommender Systems with Feedback

Author: Padmanabhan Balaji
Prawesh Shankar
Publication venue: AIS Electronic Library (AISeL)
Publication date: 14/12/2012
Field of study

The focus of present research is widely used news recommendation techniques such as “most popular” or “most e-mailed”. In this paper we have introduced an alternative way of recommendation based on feedback. Various notable properties of the feedback based recommendation technique have been also discussed. Through simulation model we show that the recommendation technique used in the present research allows implementers to have a flexibility to make a balance between accuracy and distortion. Analytical results have been established in a special case of two articles using the formulation based on generalized urn models. Finally, we show that news recommender systems can be also studied through two armed bandit algorithms

AIS Electronic Library (AISeL)

Analysis of Probabilistic News Recommender Systems

Author: Padmanabhan Balaji
Prawesh Shankar
Publication venue: AIS Electronic Library (AISeL)
Publication date: 30/07/2012
Field of study

The focus of this research is the N “most popular” (Top-N) news recommender systems (NRS), widely used by media sites (e.g. New York Times, BBC, Wall Street Journal all prominently use this). This common recommendation process is known to have major limitations in terms of creating artificial amplification in the counts of recommended articles and that it is easily susceptible to manipulation. To address these issues, probabilistic NRS has been introduced. One drawback of the probabilistic recommendations is that it potentially chooses articles to recommend that might not be in the current “best” list. However, the probabilistic selection of news articles is highly robust towards common manipulation strategies. This paper compares the two variants of NRS (Top-N and probabilistic) based on (1) accuracy loss (2) distortion in counts of articles due to NRS and (3) comparison of probabilistic NRS with an adapted influence limiter heuristic

AIS Electronic Library (AISeL)

Query Driven Conceptual Browsing : A Semi-Automated Approach for Building and Exploring Concepts on the Web

Author: Jerath Kinshuk
Padmanabhan Venu Balaji
Publication venue: ScholarlyCommons
Publication date: 01/12/2005
Field of study

The presence of communities, which are groups of highly cross referenced pages together representing a single concept, is a striking feature of the World Wide Web. Quite often a group of communities, each topically coherent within itself, may be related through a common concept manifested in each of them. Motivated by this observation, we present a method for query-driven conceptual browsing for exploring concepts on the Web starting from a userspecified query. We show how this idea is related to prior work on learning concept maps and on Web Mining, and discuss the application of conceptual browsing for user-driven exploration and discovery of new concepts on the Web

ScholarlyCommons@Penn

Web Living Case: A Web Based Business Case Delivery System for Collaborative Work

Author: Madhavan Raghav
Padmanabhan Balaji
Turner Jon
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/1996
Field of study

This paper describes the Web Living Case (WLC), a Web based business case delivery system that incorporates support for collaborative work. WLC provides a more interesting environment for case presentation than do traditional written cases. To facilitate effective collaboration, WLC provides shared workspaces for students working on common tasks, bulletin boards, and real-time conversation support, and a consistent and friendly user-interface

AIS Electronic Library (AISeL)

VOTE: Decision Simulation

Author: Arunkundram Ravi
Padmanabhan Balaji
Slade Stephen
Publication venue: AIS Electronic Library (AISeL)
Publication date: 16/08/1996
Field of study

AIS Electronic Library (AISeL)

Patient Health Record Systems Scope and Functionalities: Literature Review and Future Directions

Author: Bouayad Lina
Ialynytchev Anna
Padmanabhan Balaji
Publication venue: FIU Digital Commons
Publication date: 01/11/2017
Field of study

Background: A new generation of user-centric information systems is emerging in health care as patient health record (PHR) systems. These systems create a platform supporting the new vision of health services that empowers patients and enables patient-provider communication, with the goal of improving health outcomes and reducing costs. This evolution has generated new sets of data and capabilities, providing opportunities and challenges at the user, system, and industry levels. Objective: The objective of our study was to assess PHR data types and functionalities through a review of the literature to inform the health care informatics community, and to provide recommendations for PHR design, research, and practice. Methods: We conducted a review of the literature to assess PHR data types and functionalities. We searched PubMed, Embase, and MEDLINE databases from 1966 to 2015 for studies of PHRs, resulting in 1822 articles, from which we selected a total of 106 articles for a detailed review of PHR data content. Results: We present several key findings related to the scope and functionalities in PHR systems. We also present a functional taxonomy and chronological analysis of PHR data types and functionalities, to improve understanding and provide insights for future directions. Functional taxonomy analysis of the extracted data revealed the presence of new PHR data sources such as tracking devices and data types such as time-series data. Chronological data analysis showed an evolution of PHR system functionalities over time, from simple data access to data modification and, more recently, automated assessment, prediction, and recommendation. Conclusions: Efforts are needed to improve (1) PHR data quality through patient-centered user interface design and standardized patient-generated data guidelines, (2) data integrity through consolidation of various types and sources, (3) PHR functionality through application of new data analytics methods, and (4) metrics to evaluate clinical outcomes associated with automated PHR system use, and costs associated with PHR data storage and analytics

Crossref

DigitalCommons@Florida International University

EW-Tune: A Framework for Privately Fine-Tuning Large Language Models with Differential Privacy

Author: Behnia Rouzbeh
Ebrahimi Mohamamdreza
Pacheco Jason
Padmanabhan balaji
Publication venue
Publication date: 26/10/2022
Field of study

Pre-trained Large Language Models (LLMs) are an integral part of modern AI that have led to breakthrough performances in complex AI tasks. Major AI companies with expensive infrastructures are able to develop and train these large models with billions and millions of parameters from scratch. Third parties, researchers, and practitioners are increasingly adopting these pre-trained models and fine-tuning them on their private data to accomplish their downstream AI tasks. However, it has been shown that an adversary can extract/reconstruct the exact training samples from these LLMs, which can lead to revealing personally identifiable information. The issue has raised deep concerns about the privacy of LLMs. Differential privacy (DP) provides a rigorous framework that allows adding noise in the process of training or fine-tuning LLMs such that extracting the training data becomes infeasible (i.e., with a cryptographically small success probability). While the theoretical privacy guarantees offered in most extant studies assume learning models from scratch through many training iterations in an asymptotic setting, this assumption does not hold in fine-tuning scenarios in which the number of training iterations is significantly smaller. To address the gap, we present \ewtune, a DP framework for fine-tuning LLMs based on Edgeworth accountant with finite-sample privacy guarantees. Our results across four well-established natural language understanding (NLU) tasks show that while \ewtune~adds privacy guarantees to LLM fine-tuning process, it directly contributes to decreasing the induced noise to up to 5.6\% and improves the state-of-the-art LLMs performance by up to 1.1\% across all NLU tasks. We have open-sourced our implementations for wide adoption and public testing purposes.Comment: Accepted at IEEE ICDM Workshop on Machine Learning for Cybersecurity (MLC) 202

arXiv.org e-Print Archive