122 research outputs found
Data Science and Prediction
The world's data is growing more than 40% annually. Coupled with
exponentially growing computing horsepower, this provides us with
unprecedented basis for 'learning' useful things from the data through
statistical induction without material human intervention and acting on
them. Philosophers have long debated the merits and demerits of
induction as a scientific method, the latter being that conclusions are
not guaranteed to be certain and that multiple and numerous models can
be conjured to explain the observed data. I propose that 'big data'
brings a new and important perspective to these problems in that it
greatly ameliorates historical concerns about induction, especially if
our primary objective is prediction as opposed to causal model
identification. Equally significantly, it propels us into an era of
automated decision making, where computers will make the bulk of
decisions because it is infeasible or more costly for humans to do so.
In this paper, I describe how scale, integration and most importantly,
prediction will be distinguishing hallmarks in this coming era of Data
Science.' In this brief monograph, I define this newly emerging field
from business and research perspectives.NYU Stern School of Business, NYU Stern Center for Digital Economy Researc
A VALUE-CHAIN BASED MODEL FOR SUPPORTING INFORMATION TECHNOLOGY INVESTMENTS
Business organizations are thinking increasingly in terms of information
technology solutions to business problems, as opposed to data
processing for supporting the business. Information technology is now
viewed as an important means for achieving competitive advantage.
For firms in hardware/software business it is therefore becoming increasingly
important to provide clients with the means to do an analysis
of business needs and strategies and to think in terms of providing
global IT solutions that address these needs.
The value-chain model articulated by Porter (1985) attempts to
link IT solutions to business strategy. It is based on a simple economic
theory: a firm remains competitive by virtue of being a low
cost producer or differentiating its products/services; accordingly its
strategies must be based on countering forces (such as new entrants,
substitute products, bargaining power of buyers and suppliers) that
erode these advantages . Information technology is considered a key
factor in being able to deal with these forces Accordingly, how much to
spend and where to spend on information technology is determined by
how well it enables the firm to deal with its dominant forces (threats).
Porter's model has found widespread appeal among practitioners
(notably information systems executives) due to its simplicity and intuitive
appeal. Several methodologies have been designed around this
model that encourage executives to "think through" this model in order
to identify technologies that could provide competitive advantage.
However, there are no existing formalizations of the value-chain model
either by industry, market structure, or organizational structure. We
have been developing such a model for a specific industry (insurance)
with the objective of building an executive support tool that can show
interactively, how a proposed technology or organizational change can
impact specific metrics/values of interest of business processes defined
at various levels of abstraction, and thereby the bottom line. By using
such a model, an executive can also analyze technology and resource
requirements required to transform one set of business processes into
another, more desirable state.Information Systems Working Papers Serie
ON THE PLAUSIBILITY AND SCOPE OF EXPERT SYSTEMS IN MANAGEMENT
Over the last decade there have been several efforts at building knowledge based "expert
systemsâ, mostly in the scientific and medical arenas. Despite the fact that almost all such
systems are in their experimental stages, designers are optimistic about their eventual success.
In the last few years, there have been many references to the possibility of expert systems in
the management literature. However, what is lacking is a clear theoretical perspective on how
various management problems differ in nature from problems in other domains, and the
implications of these differences for knowledge based decision support systems for
management. In this paper, I examine some of these differences, what they suggest in terms of
the functionality that a computer based system must have in order to support organizational
decision making, and the scope of such a system as a decision aid. The discussion is grounded
in the context of a computer based system called PLANET that exhibits some of the desired
functionality.Information Systems Working Papers Serie
Prediction in Financial Markets: The Case for Small Disjuncts
Predictive models in regression and classification problems typically
have a single model that covers most, if not all, cases in the data. At
the opposite end of the spectrum is a collection of models each of which
covers a very small subset of the decision space. These are referred to
as “small disjuncts.” The tradeoffs between the two types of
models have been well documented. Single models, especially linear ones,
are easy to interpret and explain. In contrast, small disjuncts do not
provide as clean or as simple an interpretation of the data, and have
been shown by several researchers to be responsible for a
disproportionately large number of errors when applied to out of sample
data. This research provides a counterpoint, demonstrating that
“simple” small disjuncts provide a credible model for
financial market prediction, a problem with a high degree of noise. A
related novel contribution of this paper is a simple method for
measuring the “yield” of a learning system, which is the
percentage of in sample performance that the learned model can be
expected to realize on out-of-sample data. Curiously, such a measure is
missing from the literature on regression learning algorithms.NYU Stern School of Busines
Data Science and Prediction
The use of the term 'Data Science' is becoming increasingly common along
with 'Big Data.' What does Data Science mean? Is there something unique
about it? What skills should a 'data scientist' possess to be productive
in the emerging digital age characterized by a deluge of data? What are
the implications for business and for scientific inquiry? In this brief
monograph I address these questions from a predictive modeling perspective.NYU Stern, IOMS Department, Center for Business Analytic
A VALUE-CHAIN BASED PROCESS MODEL FOR SUPPORTING BUSINESS PROCESS REENGINEERING
Constantly envisioning how the rapid developments in information
technology offer new opportunities, and engineering business processes
accordingly will continue to be a difficult problem for senior management.
An important observation by Keen (1991) is that over the last
three decades, effective use of rapidly changing technology has lagged
its availability. A central problem is that of justifying the technology,
measuring its business value. The value-chain model articulated by
Porter (1985) is a natural candidate in providing a basis for this evaluation.
It is based on the simple economic theory that a firm remains
competitive by virtue of being a low cost producer or differentiating
its products/services to the customer, that is, by providing customer
satisfaction. It is intuitive to think of "the customer" as the end
user of a product or service. However, projecting this definition into
the organization, where all pieces of work within it have a customer
that needs to be satisfied provides a good basis for work design and
its implementation. As technology evolves, forcing the organization to
reassess its customers, the work must be redesigned. This is becoming
known increasingly as "process reengineering" .
Porter's model has found widespread appeal among practitioners
at the strategic level due to its theoretical simplicity and commonsense
appeal. Several methodologies have been designed around this
model that encourage executives to "think through" and identify technologies
that could provide competitive advantage. However, these
methods have some serious limitations due to the lack of a sound
conceptual underpinning and their inability to link explicitly, technology
to business value metrics. Based on an analysis of one specific
industry (insurance) we have found that simple process oriented models
such as BSP, when extended to deal with value (in terms of cost
or product/service differentiation to the customer), provide a sound
basis for exploring process reengineering. An implementation of this
methodology should enable management to simulate how a system
would "react" to various types of inputs in terms of specific metrics
of interest.Information Systems Working Papers Serie
ANALOGICAL AND DEPENDENCY DIRECTED REASONING STRATEGIES FOR LARGE SYSTEMS EVOLUTION
The maintenance of large information systems involves continuous design modifications to designs in response to evolving business conditions or changing user requirements. Because of the complexity barrier associated with engineering such systems, changes can be ad hoc and prone to errors. Based on our observations of such a process in the oil industry, we believe that the systems maintenance activity would benefit greatly if the process knowledge reflecting the teleology of a design could be captured and used in order to reason about changing requirements, and to design parts of systems that might be âsimilarâ to existing ones. In this paper, we describe a partially implemented formalism called REMAP (REpresentation and MAintenance of Process knowledge) that accumulates design process knowledge to manage systems evolution. To accomplish this, REMAP acquires and maintains dependencies among the design decisions made during a prototyping process as well as the general domain-specific design rules on which such dependencies are based. This knowledge can then be applied to prototype refinement, systems maintenance, and the re-use of existing designs to construct âsimilarâ design fragments.Information Systems Working Papers Serie
A PROBLEM-SOLVER/TMS ARCHITECTURE FOR GENERAL CONSTRAINT SATISFACTION PROBLEMS
Constraints, in various forms, are ubiquitous to design problems. In this paper, we provide a formal
characterization of a generalized constraint satisfaction problem (CSP) that can be used to model many
types of design/planning problems, and the architecture of an imlemented reasoning system for solving this
problem. The architecture includes a truth maintenance system (TMS) which is specifically designed to
reason about the relationships expressed in the constraints as a problem solution evolves. The CSP
consists of two types of data. The first type of datum corresponds to assignments that are handled by the
problem solver, and the second type corresponds to constraint terms handled by the TMS. The
dependency network, representing the relationships among constraint terms, is static and generally quite
small, depending on the number of constraint terms. Also, justifications are never manipulated (only
evaluated). This results in an architecture that makes efficient use of both space and time. The need for
efficient TMSs, even though these might deal only with certain classes of problems, is underscored by the
fact that general purpose TMSs have often been found to be highly inefficient for solving large problems.
We also show how certain instances of the generalized CSP can be formulated as an integer programming
problem, special cases of which can be solved efficiently using mathematical (integer) programming
techniques.Information Systems Working Papers Serie
Abstract-Driven Pattern Discovery In Databases
In this paper, we study the problem of discovering interesting patterns in large volumes of
data. Patterns can be expressed not only in terms of the database schema but also in user-defined
terms, such as relational views and classification hierarchies. The user-defined terminology is
stored in a data dictionary that maps it into the language of the database schema. We define
a pattern as a deductive rule expressed in user-defined terms that has a degree of certainty
associated with it. We present methods of discovering interesting patterns based on abstracts
which are summaries of the data expressed in the language of the user.Information Systems Working Papers Serie
DEPENDENCY BASED COORDINATION FOR CONSISTENT SOLUTIONS IN DISTRIBUTED WORK
Many organizational problems can be decomposed into
nearly independent subproblems the solution of which
is the responsibility of independent agents. In this kind
of work, which we call distributed work, the problems
are only nearly independent since dependencies exist
between the commitments required from each agent.
As a consequence of these dependencies, the coordination
problem becomes one of maintaining a consistent
global solution in the face of the possibly conflicting
activities of each agent. We define a normative model
for coordination protocols that indicates the formal requirements
for maintaining a globally consistent solution.
The model identifies several properties that the
protocol must enforce, namely serializability, atomicity,
completeness, and soundness. We show that these
properties are desirable in coordination protocols for
distributed work problems.Information Systems Working Papers Serie
- …