95 research outputs found
Unexpectedness as a Measure of Interestingness in Knowledge Discovery
Organizations are taking advantage of "data-mining" techniques to leverage the vast amounts of
data captured as they process routine transactions. Data-mining is the process of discovering
hidden structure or patterns in data. However several of the pattern discovery methods in datamining
systems have the drawbacks that they discover too many obvious or irrelevant patterns
and that they do not leverage to a full extent valuable prior domain knowledge that managers
have. This research addresses these drawbacks by developing ways to generate interesting
patterns by incorporating managers' prior knowledge in the process of searching for patterns in
data. Specifically we focus on providing methods that generate unexpected patterns with respect
to managerial intuition by eliciting managers' beliefs about the domain and using these beliefs to
seed the search for unexpected patterns in data. Our approach should lead to the development of
decision support systems that provide managers with more relevant patterns from data and aid in
effective decision making.Information Systems Working Papers Serie
News Recommender Systems with Feedback
The focus of present research is widely used news recommendation techniques such as “most popular” or “most e-mailed”. In this paper we have introduced an alternative way of recommendation based on feedback. Various notable properties of the feedback based recommendation technique have been also discussed. Through simulation model we show that the recommendation technique used in the present research allows implementers to have a flexibility to make a balance between accuracy and distortion. Analytical results have been established in a special case of two articles using the formulation based on generalized urn models. Finally, we show that news recommender systems can be also studied through two armed bandit algorithms
Analysis of Probabilistic News Recommender Systems
The focus of this research is the N “most popular” (Top-N) news recommender systems (NRS), widely used by media sites (e.g. New York Times, BBC, Wall Street Journal all prominently use this). This common recommendation process is known to have major limitations in terms of creating artificial amplification in the counts of recommended articles and that it is easily susceptible to manipulation. To address these issues, probabilistic NRS has been introduced. One drawback of the probabilistic recommendations is that it potentially chooses articles to recommend that might not be in the current “best” list. However, the probabilistic selection of news articles is highly robust towards common manipulation strategies. This paper compares the two variants of NRS (Top-N and probabilistic) based on (1) accuracy loss (2) distortion in counts of articles due to NRS and (3) comparison of probabilistic NRS with an adapted influence limiter heuristic
Query Driven Conceptual Browsing : A Semi-Automated Approach for Building and Exploring Concepts on the Web
The presence of communities, which are groups of highly cross referenced pages together representing a single concept, is a striking feature of the World Wide Web. Quite often a group of communities, each topically coherent within itself, may be related through a common concept manifested in each of them. Motivated by this observation, we present a method for query-driven conceptual browsing for exploring concepts on the Web starting from a userspecified query. We show how this idea is related to prior work on learning concept maps and on Web Mining, and discuss the application of conceptual browsing for user-driven exploration and discovery of new concepts on the Web
Web Living Case: A Web Based Business Case Delivery System for Collaborative Work
This paper describes the Web Living Case (WLC), a Web based business case delivery system that incorporates support for collaborative work. WLC provides a more interesting environment for case presentation than do traditional written cases. To facilitate effective collaboration, WLC provides shared workspaces for students working on common tasks, bulletin boards, and real-time conversation support, and a consistent and friendly user-interface
Patient Health Record Systems Scope and Functionalities: Literature Review and Future Directions
Background: A new generation of user-centric information systems is emerging in health care as patient health record (PHR) systems. These systems create a platform supporting the new vision of health services that empowers patients and enables patient-provider communication, with the goal of improving health outcomes and reducing costs. This evolution has generated new sets of data and capabilities, providing opportunities and challenges at the user, system, and industry levels.
Objective: The objective of our study was to assess PHR data types and functionalities through a review of the literature to inform the health care informatics community, and to provide recommendations for PHR design, research, and practice.
Methods: We conducted a review of the literature to assess PHR data types and functionalities. We searched PubMed, Embase, and MEDLINE databases from 1966 to 2015 for studies of PHRs, resulting in 1822 articles, from which we selected a total of 106 articles for a detailed review of PHR data content.
Results: We present several key findings related to the scope and functionalities in PHR systems. We also present a functional taxonomy and chronological analysis of PHR data types and functionalities, to improve understanding and provide insights for future directions. Functional taxonomy analysis of the extracted data revealed the presence of new PHR data sources such as tracking devices and data types such as time-series data. Chronological data analysis showed an evolution of PHR system functionalities over time, from simple data access to data modification and, more recently, automated assessment, prediction, and recommendation.
Conclusions: Efforts are needed to improve (1) PHR data quality through patient-centered user interface design and standardized patient-generated data guidelines, (2) data integrity through consolidation of various types and sources, (3) PHR functionality through application of new data analytics methods, and (4) metrics to evaluate clinical outcomes associated with automated PHR system use, and costs associated with PHR data storage and analytics
EW-Tune: A Framework for Privately Fine-Tuning Large Language Models with Differential Privacy
Pre-trained Large Language Models (LLMs) are an integral part of modern AI
that have led to breakthrough performances in complex AI tasks. Major AI
companies with expensive infrastructures are able to develop and train these
large models with billions and millions of parameters from scratch. Third
parties, researchers, and practitioners are increasingly adopting these
pre-trained models and fine-tuning them on their private data to accomplish
their downstream AI tasks. However, it has been shown that an adversary can
extract/reconstruct the exact training samples from these LLMs, which can lead
to revealing personally identifiable information. The issue has raised deep
concerns about the privacy of LLMs. Differential privacy (DP) provides a
rigorous framework that allows adding noise in the process of training or
fine-tuning LLMs such that extracting the training data becomes infeasible
(i.e., with a cryptographically small success probability). While the
theoretical privacy guarantees offered in most extant studies assume learning
models from scratch through many training iterations in an asymptotic setting,
this assumption does not hold in fine-tuning scenarios in which the number of
training iterations is significantly smaller. To address the gap, we present
\ewtune, a DP framework for fine-tuning LLMs based on Edgeworth accountant with
finite-sample privacy guarantees. Our results across four well-established
natural language understanding (NLU) tasks show that while \ewtune~adds privacy
guarantees to LLM fine-tuning process, it directly contributes to decreasing
the induced noise to up to 5.6\% and improves the state-of-the-art LLMs
performance by up to 1.1\% across all NLU tasks. We have open-sourced our
implementations for wide adoption and public testing purposes.Comment: Accepted at IEEE ICDM Workshop on Machine Learning for Cybersecurity
(MLC) 202
- …