28. Rugged-Landscape Models
Amazing the things you find when you bother to search for them.
—attributed to Sacagawea
In this chapter, we study the rugged-landscape model. Like spatial and
hedonic models, the rugged-landscape model defines an entity as a
collection of attributes. Each set of attributes maps to a value. The goal is to
modify attributes to construct an entity of highest value. This model
originated in ecology to study evolution. It is now also used to study
problem solving, competition among firms, and innovation. That will be
our focus here. We apply the model to reveal how interdependence in the
effects of attributes makes innovation difficult and leads to path
dependence in the solutions found and also leads to a greater variety of
solutions. We also see how more difficult problems benefit from a greater
diversity of problem solving approaches.
The chapter consists of three parts followed by a discussion of how to
extend the model to capture competition. In the first part, we describe the
ecological model of a fitness landscape and show how we can reinterpret it
as a model of problem solving and innovation. In the second part, we
discuss the implications of ruggedness within a one-dimensional model. In
the third part, we present the NK model of rugged landscapes, which
extends the one-dimensional model to an arbitrary number of binary
dimensions.
The Fitness Landscape
The fitness landscape model assumes species have features or traits that
contribute to their fitness, loosely defined as their reproductive potential,
and that individual members of a population differ in how much they have
of a particular trait. If we plot the amount of the trait on the horizontal axis
and the fitness of species on the vertical axis, we produce a graph known as
a fitness landscape in which points of high elevation correspond to high
fitness.
To draw a landscape in which the trait corresponds to the length of a
coyote’s tail, we hold all other attributes of a coyote the same, vary the
length of the tail, and measure the effect on fitness. To draw the graph, we
need to know how the tail contributes to fitness. Suppose that a coyote’s tail
helps to balance the coyote when it jumps, and that it signals happiness,
fear, or aggression. We begin at the left of the horizontal axis with a tail of
length zero. It cannot carry out either function, so it has a fitness of zero.
As the tail becomes longer both balance and signaling improve. Thus,
fitness increases with tail length. At some point, say eighteen inches, the
tail may be an ideal length for contributing to balance. If the tail becomes
longer, the coyote will be less agile. Longer tails may continue to improve
signaling value, so perhaps a tail of length twenty inches produces the most
overall fitness. Once the tail becomes longer than twenty inches, fitness
falls. The resulting graph, shown in figure 28.1, has a single peak.
image
Figure 28.1: A Mount Fuji Landscape
This landscape is known as a Mount Fuji. Such landscapes often occur in
the real world. Mount Fuji problems are considered easy. We expect that
evolution or learning will always find the peak when encountering one.
Imagine a population of coyotes with different lengths of tails. Selective
pressure would result in coyotes with tails of approximately twenty inches.
Coyotes with tails of that length optimally blend balance and signaling.
They have the highest fitness and produce the most offspring, resulting in
more coyotes with twenty-inch tails. If we think of this as an optimization
problem, we see that any hill-climbing algorithm would locate the peak.
We can apply one-to-many thinking and reinterpret this as a problem of
product design—specifically, the problem of designing a coal shovel.
Suppose that we have already decided on the length of the handle and the
shape of the pan. The remaining design decision is how large to make the
pan. Pan area will correspond to the trait on the horizontal axis. On the
vertical axis, we plot how much coal a worker could shovel in an hour
given that pan size.
As before, we start at the far left, which corresponds to a shovel pan with
zero surface area. The technical term for a zero-surface-area shovel is
“stick.” A stick cannot shovel coal and has value zero. As we increase the
pan—say, to the size of a teaspoon, then a tablespoon, then a toy shovel—
we make the shovel more and more effective. The graph of shovel fitness
slopes up. At some point, the pan area becomes too large. Lifting the shovel
becomes a chore, and the amount of coal that a person can shovel in an
hour decreases with further increases in pan size. When the pan area
becomes sufficiently large, no one would be able to lift the shovel and
fitness will be zero. Once again, we have a Mount Fuji landscape. And
once again, we should expect to be able to find the peak, the ideal pan size
for our shovel.
The idea of plotting the efficiency of shovels as a function of pan size to
determine the optimal shovel was developed by Frederick Taylor. In the
1890s, Taylor and others ushered in an era of scientific management in
which manufacturing decisions—how fast to move the assembly line, how
strong to make the weld, how many breaks to give workers—were modeled
as rugged-landscape problems. Many of the great industrialists of the
twentieth century including Henry Ford, John D. Rockefeller, and Andrew
Carnegie contributed to this movement toward efficiency, or what now is
commonly called Taylorism.
The move away from artisans making individual and distinct products to
large-scale manufacturing, in which processes were broken into parts and
each part was optimized and then routinized, led to increases in efficiency
but also, in the eyes of many, the dehumanization of labor. Herein lies a
welcome reminder about the need for multiple models. Any single model
simplifies the world and highlights only some dimensions. Scientific
management models focused on process efficiency. This led to criticism.
Making decisions based on efficiency of output caused other objectives,
such as the happiness and well-being of workers, to fall by the wayside.
The landscape model may seem to be a relatively obvious idea: plot the
fitness, efficiency, or value of a characteristic as a function of a trait or
attribute and then climb up the hill to find the optimal amount of the trait.
Thinking of solving a problem as climbing up a hill may also seem little
more than a metaphor. The validity of these critiques is not in question.
However, by constructing a formal landscape model we will produce
nontrivial insights.
Rugged Landscapes
When we allow for multiple attributes and for the contribution of one
attribute to interact with those of others, we produce a rugged landscape—
that is, a landscape with multiple peaks. Consider designing a couch in
which we must choose the thickness of the cushions and the width of the
arms. Let the value of a design equal its expected sales in the market, which
correlates with aesthetic quality. If the couch has thick cushions, then wide
arms may create a more appealing aesthetic. If the cushions are thin, then
the ideal couch may have thin arms. A two-dimensional plot of expected
sales as a function of arm length and cushion thickness will have two
peaks. One peak corresponds to a couch with thin arms and thin cushions.
The other will have thick arms and thick cushions.
Interdependent effects between variables create ruggedness on the
landscape. Ruggedness has several implications. First, different approaches
to finding the highest point on a rugged landscape may locate different
peaks. So too may different starting points. Thus, ruggedness creates
sensitivity to initial conditions and the possibility of path dependence. Each
of these implies that landscape ruggedness contributes to outcome diversity.
Ruggedness also implies the possibility of suboptimal outcomes. These are
represented as local peaks on the landscape.
Figure 28.2 shows a rugged landscape with five peaks. Four of these peaks
are local peaks, points whose neighboring points all have lower values, and
one is the global peak, the point with the highest value. To see how search
could land on a local peak that depends on the initial point, imagine
beginning from a point and then climbing uphill. This is known as a
gradient heuristic or a hill-climbing algorithm. On a rugged landscape,
gradient heuristics get stuck on a local peak.
If the starting point is at the far left, the gradient heuristic would locate
local peak 1, which is not optimal. If the gradient heuristic starts in the
region denoted by Basin 2 in figure 28.2, then it locates local peak 2. Each
of the other peaks, including the global peak, has a region such that if the
gradient heuristic begins in that region, it will locate that local peak. These
regions are called basins of attraction and are identified in figure 28.2. The
global peak has the smallest basin of attraction. If we were to choose a
random starting point and apply the gradient heuristic, the global peak is
the least likely to be found.
The basins of attraction depend on the heuristic. A different heuristic may
produce different basins. For example, consider the heuristic go to the right
which moves to the right until finding a local peak. This heuristic produces
identical local peaks as the gradient heuristic, but those peaks have
different basins of attraction, as can be seen by comparing figure 28.3 with
figure 28.2.
image
Figure 28.2: A Rugged Landscape with Five Peaks
To find an optimal or near-optimal peak on a rugged landscape requires
either diversity or sophistication. The value of diversity should be selfevident. If distinct heuristics locate different peaks, then applying multiple,
diverse heuristics to a problem will produce multiple, diverse local peaks,
and one can choose the best from among these.1 The same result will occur
if one applies the same heuristic from different starting points: distinct local
optima will be found and the best among them can be chosen.
Note also that the ruggedness of the landscape, as measured by the number
of peaks, correlates with problem difficulty. However, a problem can be
difficult to solve yet not have a rugged landscape. The problem of finding a
gold coin in a cornfield would be represented by a flat landscape with a
single peak at the coin’s location. The landscape would not be rugged, but
the coin would be very difficult to find.
The NK Model
We now describe the NK model, which allows us to formalize the
connection between interactions and ruggedness.2 The model represents
objects, or what we might call alternative solutions, as binary strings of
length N. The value of an object equals the sum of the contributions of each
bit on the string. The K term in the model refers to the number of other bits
that interact with each bit to determine its value. If K equals zero, the value
function is linear. If K equals N − 1, then every bit interacts with every
other and the value of each string is random. Thus, we can think of
increasing K as tuning the ruggedness of the landscape to somewhere
between Mount Fuji and random.
image
Figure 28.3: Basins of Attraction Produced by the “Go to the Right”
Heuristic
NK Model
An object consists of N bits, s ∈ {0, 1}N.
The value of an object is V (s) = Vk1(s1, {s1k}) + Vk2(s2, {s2k}) + ··· +
Vk2(s2, {s2k}) where {sik} equals a randomly selected set of k bits other
than i, and Vk1(s1, {s1k}) is a random number drawn from the interval [0,
1].
K = 0: Results in a linear function of the bits.
K = N − 1: Any bit change produces a new random contribution from each
bit.
The NK model framework provides a wonderful space to explore ideas and
ask questions. The first question we ask is how the number of local optima
depends on the number of interaction terms. We then ask how the height of
the global optimum depends on the number of interaction terms. At the
moment, both of those questions are ill-posed because we have yet to
define how we are searching the space of possibilities, that is what heuristic
we are using. Recall that what the set of peaks depend on our choice of
heuristic.
In what follows, we rely on the single-flip algorithm. This algorithm
chooses each attribute in sequence and switches the attribute’s state. If
changing this attribute results in a higher value, the switch is adopted.
Otherwise the attribute is returned to its original state. The choice of this
algorithm can be motivated in two ways. It can be interpreted as a crude
model of genetic mutation, where good variants take over in the population
and bad ones die. It is also the most natural way to represent a hill-climbing
algorithm in this space.
We first evaluate the NK model with N = 20 and K = 0. When K = 0, each
attribute’s contribution to the total value is independent of the other
attributes. The single-flip algorithm can identify the better state for each
attribute and the global optimum. Thus K = 0, no interactions, corresponds
to a Mount Fuji landscape. Each state’s value is uniformly distributed in the
interval [0, 1]. It can be shown that the higher of two random draws from a
uniform distribution has an expected value of image. If we average the
contributions across twenty attributes, the global optimum will also have an
expected value of image.
At the other extreme, (N = K − 1), each attribute is connected to every other
attribute. When the state of an attribute is switched, the contribution of
every attribute will change. It will be a new random number drawn
uniformly from the interval [0, 1]. The value of the object will be the sum
of these twenty new random numbers (one for each attribute), meaning that
each flip of an attribute results in a value for the entire object that is
uncorrelated with the earlier value. Thus, the landscape will be incredibly
rugged—just as likely to go up at any point as it is to go down.
By applying that insight, we can derive the expected number of local peaks.
If we start from any alternative, the single-flip algorithm compares that
alternative to each of N alternatives. For example, starting from the
alternative with all bits taking value zero, the algorithm will evaluate the N
alternatives in which exactly one bit takes value one.
image
A local peak must have a higher value than each of these N alternatives.
The probability that the original alternative has the highest value equals
image. Therefore, the number of local peaks approximately equals the
number of possible alternatives, 2N, divided by N. For N = 20, that
calculation yields fifty thousand local peaks. With so many local optima,
the single-flip algorithm rarely locates the global peak.
The relevant question is not the number of these local optima, but their
values. It remains only to compare the expected average value of these
local optima with the expected value of the global optimum. That
comparison will determine how well the single-flip algorithm performs. To
calculate theses values, we can use the central limit theorem. It is not
difficult to show that the expected value of a local optimum equals
approximately 0.6 while the expected value of the global optimum equals a
little more than 0.75.3 Comparing these to the global optimum for the case
K = 0, which equals image, reveals that the local peaks on the rugged
landscape have lower values than on Mount Fuji, but the global peak has a
higher value.
This begs the question of what happens in between these two extremes, as
we increase the number of attribute interactions, K, from zero to N −1. The
answer is that we see both effects. The increased number of interactions
produces a higher global peak, but more, and therefore lower-value, local
peaks. Assuming that the search uses the single-flip algorithm, then
computational investigations of this model show that for small K, the
benefit of the interactions—a higher global peak—outweighs the increase
in local peaks. So initially, the expected value of a local peak increases in
K. The growing number of local peaks means that the average value will
decrease. So if you were stuck using the single-flip algorithm, you would
prefer a relatively small K value, around, say, 3 or 4. But why should we be
constrained to use this simple heuristic of switching a single attribute?
Evolution by mutation may be constrained to this heuristic, but we are not.
We could switch the state of two attributes or even three. A more
sophisticated algorithm will reduce the number of local optima.
Ruggedness and Dancing Landscapes
The NK model implies that we want a moderate degree of interdependence
as that creates higher peaks. Many-model thinking demands that we step
out from the particular assumptions of the model and consider the logic that
drives that result. We find that the logic consists of two parts. The first rests
on combinatorics: the number of pairs of combinations increases with the
square of the number of pairs and the cube of the number of triples. Thus,
interdependent effects create more possibilities of beneficial interactions.
The second part rests on the fact that we need only keep the better
combinations. Imagine grabbing any four food items to make a snack. Four
items implies six possible combinations of two items. Suppose that we grab
the following set of four: {pickles, bananas, chicken, caramel}. Of the
resulting six pairs—bananas and pickles, pickles and chicken, caramel and
pickles, bananas and chicken, caramel and bananas, and caramel and
chicken—only one sounds remotely appealing. We only need choose that
option. We enjoy the caramel bananas. We ignore the rest.4
A similar logic applies in evolutionary systems. Phenotypic combinations
that produce positive interactions—a hard shell and short sturdy legs—
remain in the population, while survival of the fittest works against
combinations that produce negative interactions. We do not encounter
many slow-footed, tasty animals with vibrant colors. If they ever existed,
they have been caught and eaten.
We encountered a similar intuition in our model of search. When we have
an abundance of possibilities, we prefer variation. The same logic applies
here: combining pairs (and triples) produces abundant possibilities. And we
would have preferred that these many possibilities had high variation in
value. We then have a greater likelihood that one of them has a very high
value. Given that interdependent effects increase variation, on the whole
they are advantageous, but only up to a point. As we have just seen, too
many make the landscape random. Ideally, then, we have a moderate
number of interactions. Some argue that if the number and size of
interactions can evolve or adapt, then systems should naturally evolve to
rugged landscapes with high peaks. This would suggest that systems tend
toward complexity and not equilibrium or randomness.5 When and whether
that is true is exactly the sort of question that is fun to explore with models.
One final point: We have taken the landscape as fixed. In ecological and
social systems, the landscape that a species or firm confronts depends on
the actions and attributes of others. An adaptation by a competing species,
or a change in strategy by another firm, will shift and rearrange the fitness
landscapes of competitors. We can now reinterpret our earlier models of
spatial and hedonic competition as movements on dancing landscapes.
Those movements could lead to an equilibrium, where each player stands
on a local or global peak, or, competition on dancing landscapes may lead
to complex patterns of actions and outcomes. Even a cursory glance at
ecosystems, politics, and economics suggests that we see more of the latter.
One reason that we see so much complexity may well be that much of our
world consists of adaptive and purposive actors maneuvering on dancing
landscapes. To make sense of that complexity, we need many models.
Do We Patent Knowledge?
Our well-being rests atop a centuries-long accumulation of knowledge that
includes the laws of physics, the combustion engine, double-entry
accounting, the germ theory of disease, X-rays, and HTML. Knowledge is
often a public good. Knowledge is always non-rival. It may or may not be
excludable. Exclusion requires verification, which is easier when a physical
artifact embeds the knowledge. Verifying that someone used an algorithm
or technique for solving a problem may be impossible. Verifying that
someone embedded that algorithm in a software program is not.
When knowledge can be excluded, we confront a choice. We could treat
knowledge like roads and national defense, and tax people to fund its
production. Governments pay people to think through grants and indirectly
by supporting universities. Governments also allow people to patent
knowledge. A patent creates an incentive to produce knowledge by creating
a period of exclusive rights to use the knowledge and to charge others for
its use. In the United States and Europe, patents last twenty years from the
date of filing.6 Advocates of patents argue that private entities would have
little incentive to spend years developing a better mousetrap, computer
algorithm, or sound system if their discoveries could be used by anyone for
free. They argue that patents overcome the incentive problems inherent in
knowledge production.
Boldrin and Levine construct an argument that borrows ideas from several
models to argue against patents.7 In models in which ideas can be
combined, patents can hinder innovation by limiting recombinations. One
firm’s patent on touch-screen technology may reduce the number of other
firms that design new products that incorporate that technology. Absent
patent protection, the technology could be incorporated into more products.
Innovation would increase.
Proponents of patents push back by noting that while slowing innovation
may be bad, without patents the reduction in investment would be much
larger. To counter that claim, Boldrin and Levine use a logic partly based
on our diffusion model. A useful product based on new knowledge will
spread quickly through the population of buyers. That was true of the radio,
television, Google’s search engine, and Facebook. This creates a firstmover advantage. The innovator can still benefit, but only by producing
something with the idea. With a patent, an inventor can wait for others to
implement the idea and profit.
Boldrin and Levine also question how much credit the inventor deserves
anyway. If breakthroughs were the result of a solitary genius, and most
ideas would never have been produced without incentives, then the case for
patents is stronger. The rugged landscape model suggests that difficult
problems may have multiple workable solutions. New inventions,
particularly those that combine existing ideas and technology such as the
car, the telephone, and online auctions, may be natural occurrences not acts
of genius. Any number of people might have made these innovations given
the ideas swirling around in the community of thinkers. The simultaneity of
major discoveries—calculus (Isaac Newton and Gottfried Leibniz), the
telephone (Alexander Graham Bell and Elisha Gray), and the natural
selection theory of evolution (Charles Darwin and Alfred Russel Wallace)
—supports that inference. In sum, many-model thinking shows advantages
and disadvantages to patents. The deeper, more nuanced understanding the
models provide argues for a more flexible patent policy. Perhaps some
ideas—those that many people might have discovered and those that could
recombine with other ideas—should have different lengths or types of
patents, or even not be patentable at all.