Explaining Validation Discussion from the SD List

From SystemsWiki

Jump to: navigation, search

Explaining Validation

Mar 1 to 3. 2006

Posted by Timothy Quinn

I have two very specific questions for the practitioners among you. Before I pose them, here is the setup:

In the past three weeks, I have been asked on several occasions how we "validate" our models when they are built from "subjective" accounts of how the system works (i.e., on expert opinion). Jay Forrester, in Industrial Dynamics, wrote,

"Any 'objective' model-validation procedure rests eventually at some lower level on a judgment or faith that either the procedure or its goals are acceptable without objective proof." (Forrester 1961, p. 123)

I take these judgments and faith to be the intuition or common-sense knowledge of a model's critic. For example, we know that physical inventory cannot go negative, or more workers have a higher work completion rate than fewer (unless you reach the nonlinear too-many-cooks regime). Therefore, John Sterman, in Business Dynamics, concludes,

"Validation is intrinsically social. The goal of modeling, and of scientific endeavor more generally, is to build shared understanding that provides insight into the world and helps solve important problems. Modeling is therefore inevitably a process of communication and persuasion among modelers, clients, and other affected parties. Each person ultimately judges the quality and appropriateness of any model using his or her own criteria." (Sterman 2000, p.850)

To restate, validation is building shared understanding of

(1) the problem, (2) how the model simplifies the real world in favor of achieving a purpose, and (3) the appropriateness of those simplifications for the purpose.

I have struggled to make a compelling case for model "validity" when my potential critic's attention span is limited to a few minutes only. I cannot take him or her equation-by-equation and explain the empirical or common-sense justification for each.


(1) Is there a quick way, possibly a good analogy, to communicate the point that validation is about confidence in a model's simplifications of reality for a specific purpose? I am looking for something akin to the bathtub analogy so often used for explaining stocks and flows to a lay audience.

Here is my best failed attempt: Imagine your goal is to cross a ford without getting your feet wet. Can you do it? Only by demonstrating that there exists a sequence of stepping stones, each within one stride of the next, spanning the river. If we agree on the existence of each hop, then you must concede the goal is possible.

The reason this analogy fails is that the goal is achieved by cutting it into sequential pieces, each of which must lead to the next. In contrast, from uncontroversial pieces, SD models can produce surprising and counterintuitive results.

(2) What published paper best exemplifies how a model's formulations should be justified, based both on tests of intended rationality, "common knowledge" reality constraints (e.g., conservation of mass), and expert opinion?

Posted by Michael J Schwandt

The paper that Michael Radzicki presented at the 2004 ISDC, "Expectations Formulation and Parameter Estimation in Uncertain Dynamical Systems: The System Dynamics Approach to Post-Keynesian-Institutional Economics" contains a solid review of SD validity research and provides a good example of applying those techniques.

Posted by John Gunkler

One approach that helps me is to think about what makes any scientific theory plausible.

1. It must explain the phenomena of interest -- which means, you must be able to derive or create the phenomena using the theory (or model.) 2. It must predict future phenomena. (If some of these are surprising, all the better!) 3. You must make a good case for ruling out alternative explanations.

It's number 3 that's most interesting and, I believe, useful. (1 and 2 are sort of "table stakes" -- you must have these or you're not even in the game.) And it's number 3, I think, that has led Drs. Forrester and Sterman to write some of the things they've written about communication, judgment, and persuasion. Because in order to rule out alternative explanations, you first need to know what your client would think of as plausible explanations (models). This means, you must get "subjective" with your clients and encourage them to provide their best shot at causal explanations before you build your model. And it means you also have to think very hard about what a harsh critic of your model will say -- what alternative explanations they could use to discredit your efforts.

You score big if your model explains (and, even more dramatically, if it predicts) something that their formulations do not. You also score big if, with the deeper understanding provided by your model, it becomes evident that their explanation does not work.

The kinds of plausible alternative explanations I run into are mostly exogenous events -- partly because that's the way most people think about causes, and partly because it seems it is always all of the "other stuff going on" that wants the credit for positive changes in outcomes. If, for example, we are working on an endogenous theory of sales growth (i.e., our model is identifying policy decisions that affect growth in sales) -- and sales go up during the modeling period -- then everyone else who did anything at all about sales is going to try to claim credit: "Oh, that's because of the contest we ran in January. Best contest we ever had!" or "You know, the competition had a product recall and that really boosted our sales." or etc., etc. The good news here is that if we show that endogenous factors override (after some transition effects) exogenous factors, then we can argue against all of these kinds of alternative explanations. They become mere blips on the outcome graph, not fundamental forces.

So, I recommend focusing on learning all you can about people's "pet projects" to improve (and "pet theories" to explain) whatever outcome you're modeling. I recommend getting all of your critics' mental models captured on paper -- not a bad idea, anyway. And I recommend doing a deep critique of your own thinking, looking for loopholes and poorly understood or poorly communicated aspects of your own work. And I recommend actually performing the exercise (maybe with others' help) of coming up with alternative explanations for the phenomena. Then your challenge is to come up with something that does a better job of (fill in the goal of modeling here -- whether it's helping people understand where the leverage points for change are, or helping choose new policies to improve outcomes, or creating a common mental model to improve communications, etc.) and find all the ways it is superior to their existing mental models, then rule out (or account for in the model) the effects of their pet projects.

Posted by Jack Homer

Good questions, Tim. People ask me about validation all the time, too. Many of them have statistical or other technical backgrounds and are thinking of confidence intervals and other numerical demonstrations of reliability via sensitivity testing. I tell them about the difference between numerical sensitivity and behavioral or policy sensitivity, but that can be pretty abstract without further explanation. One way of explaining it is to say that if what I was after primarily was a tight confidence interval, then I would not bother with 99 % of my model's equations, and would just look for one or two simple polynomial equations that provide a nice tight fit to historical data. In most cases, the more uncertain parameters I add to my model, the more uncertain its outputs will be as well. Why would I want to add uncertainty to my model? Because that is the only way I can make it useful for understanding what is going on in the real world and for informing policy. But doesn't that uncertainty stand in the way of achieving such understanding? Yes, it can, but only to the extent that the model is behaviorally or policy sensitive to changes--not to the extent that it is numerically sensitive (unable to produce tight confidence intervals). Fortunately, well-constructed models that adhere to laws of conservation and bounded rationality, etc., typically display much less behavioral and policy sensitivity than one might expect, and therefore can tell us useful things even if they do not produce nice tight confidence intervals. And, even if a model does display behavioral or policy sensitivity, it can at least direct research and data gathering to just those areas that really matter the most and not all the other areas one might think of, which is a pretty big deal by itself.

Analogy? Model validation, like model building itself, is like baking a cake. There are several necessary ingredients to get it to look right and smell right and taste right, but the final result is a blending and transformation of all those components and often has elements of surprise from the gestalt of the whole. (I have also in the past compared modeling to a zipper, involving the careful bringing together of structural hypotheses and evidence to create useful theory; see SDR 13(4) 1997.)

Published works? On validation, I generally point people to Sterman Chapter 21 and Forrester and Senge 1980. On testing for intended rationality, I point them to Sterman Chapter 15 and Morecroft 1983 and 1985.

Posted by R. Oliva

You might also be interested in:

Oliva R. 2003. Model calibration as a testing strategy for system dynamics models. European Journal of Operational Research 151(3): 552-568.

An abstract of the paper and the tools described in it are available at:


Posted by yaman barlas Mar 4.2006

Here are some references (of mine) on model validation that you may find useful:

  • On conceptual and philosophical aspects of validity and validation:

- "Philosophical Roots of Model Validation: Two Paradigms" (with Stanley Carpenter), System Dynamics Review, Vol.6, No.2. 1990, pp.148-166.

- "Comments on 'On the very idea of a system dynamics model of Kuhnian science'" System Dynamics Review, Vol.8, No.1, 1992.

  • On Quantitative BEHAVIOR Testing:

we have Behavior Testing Software (BTS II), available at our web site http://www.ie.boun.edu.tr/sesdyn/fresources.htm

- "Multiple Tests for Validation of System Dynamics Type of Simulation Models", European Journal of Operations Research, Vol.42, no.1, 1989, pp. 59-87.

- "An Autocorrelation Test for Output Validation", SIMULATION, Vol.55, No.1, 1990, pp.7-16.

-"A Behavior Validity Testing Software (BTS)" (with H. Topalog(lu ve S. Y?lankaya), Proceedings of the 15th International System Dynamics Conference,Istanbul, 1997.

  • On STRUCTURE Testing:

- "Tests of Model Behavior That Can Detect Structural Flaws: Demonstration with Simulation Experiments", in Computer-Based Management of Complex Systems (P. M. Milling and E. O. K. Zahn, eds.), 1989, pp. 246-254.

- "A Dynamic Pattern-oriented Test for Model Validation" (with K. Kanar), Proceedings of 4th Systems Science European Congress, Valencia, Spain, Sept. 1999, pp. 269-286 (Under revision to be submittted for SDR or another journal).

- "Automated dynamic pattern testing, parameter calibration and policy improvement." Proceedings of international system dynamics conference. (With Suat Bog), Boston, USA, 2005., http://www.systemdynamics.org/conf2005/proceed/papers/BARLA309.pdf and

we recently developed software (SiS) that I demonstrated in Boston, but it has a few bugs and should be ready in a couple of months

  • Overviews on all aspects of validity and validation

- "Fundamental Aspects and Tests of Model Validity in Simulation", Proceedings of SMC Simulation Multiconference, Phoenix, Arizona, 1995, pp.488-493.

- "Formal Aspects of Model Validity and Validation in System Dynamics", System Dynamics Review, Vol.12, no.3, 1996, pp. 183-210.

- "System Dynamics: Systemic Feedback Modeling for Policy Analysis" in Knowledge for Sustainable Development - An Insight into the Encyclopedia of Life Support Systems, UNESCO-Eolss Publishers, Paris, France, Oxford, UK. 2002, pp.1131-1175.

Posted by Erling Moxnes Mar 6.2006

Let me add one simple example to Jack Homer's excellent summary.

In a model of a renewable resource (SDR 20, No.2, 2004, pp.151), it turns out that a static model explains very well the (simulated) historical development. The amount of lichen is found to be a linear function of the number of reindeer, with the very impressive t-ratio of 63 for the slope parameter! No polynomials are needed, only two parameters describing the linear relationship. A proper dynamic model is not likely to beat this impressive fit to the data. However, when the static and the dynamic models are used to recommend policies, the advice differs very much. The static model supports a disastrous policy. To generalise, the example illustrates the problems of using too simple models in systems with shifting dominance.

The above critique of statistical tests could also be launched from the point of view of statistics. Using statistical methods, there is no excuse for using inappropriate models - a healthy perspective is provided by Bayesian statistics.

Posted by John Barton Mar 8.2006

Another way of viewing "validation" is to realize that this terms frames the debate within the domain of deductive logic and its related objectivist and refutationist positions. However, managers do not act on the basis of well established hypotheses. Instead, they act on the basis of the what they perceive as the best hypothesis available within their "community of inquiry". SD modelling with its checks and balances (triangulation) provides a rigorous approach to establishing such a hypothesis using the events- patterns - structure framework. This constitutes abductive reasoning, not deductive reasoning.

On taking action, the manager, as participant /observer, monitors the implementation of the strategy and actively intervenes "to make it happen".

Using abduction (the method of hypothesis) significantly reframes the debate from validation to triangulation of the hypothesis and emphasises the importance of a "community of inquiry" much as described by John Sterman etc. This shifts the emphasis from "validating" a hypothesis to evaluating the strategy from an open systems perspective.

(A paper detailing this argument has been submitted to the Nijmegen conference).

Abductive inference is not new- it was part of the Greek dialectic, but was reconstituted by the American pragmatist philosopher Charles Saunders Peirce (1839 - 1914) and forms the basis of Dewey's method of reason.

You may wish to follow up on Thomas Powell (2001) "Competitive Advantage: Logical and Philosophical Considerations". Strategic Management Journal Vol 22: 875 - 888. Powell argues that competitive advantage is pragmatic, inductive inference.

Posted by Geoff McDonnell Mar 9.2006

John Barton's posting on abductive inference reminded me of Gary Klein's Sources of Power book about the way highly time-stressed experts make decisions (esp in medical military and fire-fighting using abductive inference IMHO). This book had a profound effect on the way I now think about clinical and health policy decision support.

As a review (page 2 on the site link) states

Richard I. Cook MD, Focus on Patient Safety, review of Sources of Power, 1998.

"With his colleagues, Klein has spent the past two decades observing people doing mental work in order to discover how they cope with demands of the workplace. What are the processes of decision making? How do people deal with uncertainty and risk? How is it that experts are able to discern subtle cues and do just the right thing in situations where novices fail? Sources of Power is a marvelous summation of Klein's long experience. It may also be the most readable and coherent description of what is presently known about how human cognition works in the real world."

Posted by George P.Richardson Mar 12.2006

There's a PowerPoint presentation online Validation as a Social Process

Posted by Mike Fletcher Mar 12. 2006

I would like to add some comments regarding your reference to abductive inference, "triangulation" and Model Validation. Many published definitions of abductive inference are a bit vague and that has lead to some confusion. Just to be clear, and hoping our definitions are the same, I want to provide my current working definition. Schum's short definition is something "plausibly true."

Pierce's "Beanbag" definition:

Rule: All the beans from this bag are white

Result: These beans are white

Case: These beans are from this bag.

Based on these definitions, triangulation is a good description. The analogy one could draw here is that the "truth" is the point at the center of a circle, and that all inquiry begins on the circumference. Luckily, due to the nature of circles, the distance to truth is equal from any point on the circumference. We triangulate and what results is a region, possibly an ellipse, which outlines a rough estimate of what region the "truth" lies in.

So what does this have to do with model validation? Unfortunately, validation, as it is often practiced, isn't very abductive. Validation is often practiced like societies practice rituals. The original goal of the ritual usually served some valid purpose, but often, as time goes by, the original purpose of the ritual gets lost in the rote execution of the ritual. Thus validation at the very end of the procedure rather misses the point of validation. We should be learning throughout the entire modeling process, and thus saving validation for the very end means that not much is going to be learned. The "proof" at the end isn't likely to be very abductive, divergent, generative, or help us to expand our mental models.

Validation isn't just throwing a bit of "rigorous math" in at the end to "prove" we are right and all our works brilliant! As my SD Professors at WPI commented, validation is a continual process of learning. If we do validation that way, it is likely to be abductive.

Abductive inference is inherently divergent, and generative and is therefore extremely helpful in adding new information into the debate, and thus expanding mental models. (Which, after all, should be our true goal.) In the sense that information is not evidence until it is attached to a hypothesis, and thus multiple hypothesis are likely to introduce new information. Information that may be vital, but not relevant to current hypothesis, may be discounted, so multiple hypothesis in respect to a question is usually a wise course.

Abduction can also be part of an analysis of competing hypothesis (summaries as: accepting as the working hypothesis the hypothesis with the least inconsistent evidence). We probably should look at each iteration of the model as a hypothesis. All the models iterations we throw into the dust bin are just as important to validation (and perhaps more so) than the "proof" at the end. They certainly are critical to the learning process. I should probably also comment, as others no doubt have, that validation not really a good word to use to describe what is really going on, since we are really building confidence and "goodness to purpose," not proving truth.

Posted by Mike Fletcher Mar 12.2006

Klein’s book is good, and I too found it very interesting, but I think it's important to point out that his conclusions probably only hold in respect to expert judgement in respect to their specific area of expertise. Most of his studies show experts working in their specific domain. The decision making of experts within their field probably holds up reasonably well as Klein argues, with some limitations. (Experts for examples can miss new patterns that a novice might see, because the experts are too busy looking for old patterns.)

In the real world most decisions are made by people who think they are experts but really aren't, or are only expert a small portion of the problem! That is, decisions are routinely made using the Klein's "expert model" but lack the required expertise!

Overall I'm somewhat alarmed by several of what I would term the "natural decision making" books which have been quite popular of late. "The Wisdom of Crowds" is another similar title discussed earlier on this list

I think it's extremely risky to profess that there are relatively easy prescriptions available for making hard decisions against complex problems! Several authors, and probably Klein to a small degree, are risking becoming "prescriptive." All the pessimism about "bounded rationality" is no reason for a permanent state of decision despair. We don't have to apply real critical thinking or rigorous analytic processes to our decisions. A prescription is available. Risky thinking. There might be easy solutions to complex problems, but finding them is usually hard, takes time, and will likely involve costly mistakes.

Crowds do indeed know more any individual member of the crowd, but their wisdom might simply amount to the sum of their ignorance. That is, everyone can be wrong, and the sum total of ideas might still be insufficient. Experts generally make good decisions within the narrow boundaries of their field, but they may be totally incapable of making effective decisions elsewhere.

Theory Prediction Projection

Questions & Comments to Geoff McDonnell
Personal tools