Evaluators need to know about complexity because the programs they evaluate often exhibit complex behaviors. Without understanding complexity, evaluators cannot construct models, develop methodologies, and interpret data in ways that accurately describe what programs are doing, and why. If they get it wrong, so too will evaluation users and stakeholders. (Note the use of “behavior” rather than “system”. This is done because knowing that something is a complex system does not inform operational decisions about designing or conducting an evaluation. Knowing the specific behaviors of complex systems does inform operational decisions.)
All well and good to say that evaluators need to know “more” about complexity, but what does “more” mean? All else being equal, a little more is better than a little less. To illustrate with an example from statistics – consider the humble t test. (This example is borrowed from another blog post.) How much statistical knowledge does one need? The answer falls somewhere on a continuum:
- A single knowledge nugget: P<.05 probably means the two groups are different.
- Much else. A few examples include the nature of true score and error, statistical power, other analysis choices (e.g., non-parametric), the historical roots of the .05 criterion, sampling theory, post-hoc testing, and the ‘central limit’ theorem.
So how much statistics does an evaluator need? There is no need for evaluators to be professional statisticians, but neither should they cleave to the simple end of the continuum. We do not need to be complexity scientists, but we would do better evaluation if we chose to know a bit more rather than a bit less. (Some people question whether there is such a thing as “Complexity Science”(or more than one) but using the term “complexity science” is still meaningful (Phelan, 2001).
A bit more or a bit less about what? To extend the statistics example, statistics covers a lot of ground. Does an evaluation mostly involve changes in rates (e.g., hospital admissions), or multiple comparisons of school attendance? If the first, one would want to know a lot about the statistics needed to determine change in mean time between failure or success. In the second, one would want to know about ANOVA and post-hoc tests. So it is with evaluators and complexity. Some aspects of complexity are more salient than others. The more salient, the more we need to know. The more peripheral, the less we need to know.
.
Depth – how much do evaluators know about complexity?
The best way to answer this question is with examples.
Example 1: Emergence:
“Emergence” colloquially means “the whole is greater than the sum of its parts”. But that is not correct. What the term means is that the parts lose their identity. This has fundamental implications for program models, theories of causality, priorities for measurement, methodology, data interpretation, etc. To illustrate, compare a cylinder in an internal combustion engine with “urban vitality”. A car is more than the sum of its parts. But a cylinder is still a cylinder. One can describe how it fits into an internal combustion engine, and how it contributes to the operation of the automobile. The car is more than the sum of its parts, but each part maintains its identity. Hence, this is not an example of emergence.
As for “urban vitality” it evokes population density, number of restaurants, cultural diversity, population demographics, safe walkable streets, and so on. And of course, each of these features contributes to “urban vitality”. But does measuring each component add up to a measure of “urban vitality”? It is a construct that cannot be explained by examining each of its components. “Urban vitality” is an emergent phenomenon.
Why does a precise definition of emergence matter? Because it affects the models that evaluators build. Will they focus attention on individual components, or toward an effort to construct a scale for the emergent variable? Will they incline an evaluator toward a granular or global set of causal relationships? Will evaluators be inclined to devise methodologies and interpret data based on a granular or an emergent perspective on program operations and impact?
Which evaluators would benefit from a more rigorous knowledge of emergence? Those whose work bring them face to face with a need to understand why change happens, and what the patterns of change are.
Example 2: Butterfly effect:
Evaluators are besotted by that insect. Edward Lorenz was brilliant, but he should have stuck to talking about chaos and sensitive dependence in “Deterministic Nonperiodic Flow” (Lorenz, 1963). In Evaluation the problem is that systems are too stable, not that they can change at the flutter of wings. The problem is that evaluators do not appreciate how sensitive dependence works with other complex behaviors. A local perturbation can change a system’s trajectory in radical ways. But that is a long way from saying that systems are unstable with respect to small change.
“Sensitive dependence” is best understood in conjunction with “social attractors” and linkages among a system’s elements. Attractors influence the consequences of small changes. Linkages have a lot to do with the potential for a network to change. Evaluators talk a lot about butterflies. But they do not talk about how sensitive dependence, social attractors, and linkages come together to affect whether and how quickly a system may change. Any evaluator interested in resistance to change, or sustainability would benefit from a more rigorous appreciation of the interplay among sensitive dependence, social attractors, and linkages.
Breadth – How many complexity concepts do evaluators know about?
Knowing just enough about a topic to appreciate its relevance enables one to recognize possibilities. For example, an evaluator does not have to be an expert in Empowerment Evaluation to appreciate its emphasis on the value of giving stakeholders the ability to monitor and evaluate what they are doing. It is easy to imagine bumping into a scenario where a casual knowledge of Empowerment Evaluation would lead an evaluator to recognize its relevance.
A word of caution is necessary. Complexity science covers a lot of terrain. An encyclopedic perspective is not necessary. We all need expertise in the tools that enable our work. But to draw on the statistics example, imagine an evaluator who never had need to determine change in mean time between failure until she or he was drawn into an evaluation scenario where such an analysis was useful. How would the evaluator appreciate the need to find an expert in that subject? Because he or she knew just enough to know what the method could do. So it is with complexity.
Example 1: Scaling:
Scaling does not have wide applicability in Evaluation, but it does fill an important niche. It deals with regularities as things get larger or smaller. Some very interesting scaling research has been conducted in urban settings, e.g. number of restaurants, patents, crimes per population, etc. (West, 2017). Scaling factors are often the same across many phenomena. There is a consistent sub-, or super-linear trend (depending on the variable) in log transformed data comparing population size to the variable of interest. This knowledge helps in evaluating efforts to change urban settings.
Example 2: Stigmergy:
“Stigmergy” was first developed to understand insect behavior (Theraulaz & Bonabeau, 1999) but has been generalized to many human-scale situations where changes in an environment cue the behavior of subsequent actors (Parunak, 2006). Have you ever seen hikers pick up a rock and add it to a rockpile that served as a trail marker? Why add to the cairn there and not in another spot that might serve just as well? Because the behavior of the previous hikers changed the environment in such a way that subsequent hikers would “know” what to do. That is stigmergy in action. Stigmergy does matter when evaluating change in organizational behavior or exploring the role of systematic planning in the exercise of workforce choices.
About the author
Jonny Morell, PhD, is an organizational psychologist with extensive experience in the theory and practice of program evaluation. See his blog: Evaluation in the Face of Uncertainty: Surprises in Programs and their Evaluations. For the videos, go here.