10. Network Models Network theory is a whole branch of science, but it’s relatively new in terms of the last 20 or 30 years. We haven’t had a chance to take all that theory out of the universities and apply it to ask: “What kinds of networks should we build, and for what purposes?” —Anne-Marie Slaughter In this chapter, we cover models of networks. A comprehensive study of networks would require multiple books. We have more modest goals. We want to understand the basics of networks, to be able to name their parts, and to ask why they matter for modeling. The answer we arrive at will be that networks almost always matter. Any model we construct, be it of a market, the spread of a disease, or the transmission of information, can be enriched by embedding the actors in a network.1 Networks are ubiquitous. People talk of trade networks, terrorist networks, and networks of volunteers. Species organize into food webs, a form of network. Firms build supply chain networks. As already noted, the financial system is usefully thought of as a network of promises to pay. Networks have always been important to understanding social relations. During much of human history, social networks were constrained by geography and difficult to map. Due to technological advances, many social interactions and economic transactions now take place over virtual networks which can be analyzed using models. The organization of this chapter follows the same structure-logicfunction format we applied to distributions. We first characterize the structure of networks using statistical measures of degree, path length, clustering coefficient, and community structure. Then we discuss common classes of networks: random networks, hub-and-spoke networks, geographic networks, small-world networks, and power-law networks. After that we turn to the logic of how networks form. We construct microlevel processes that produce the network structures we see. Last, we take up function, the question of why network structure matters. Here we focus on five implications. We begin with the friendship paradox, and then describe the six degrees of separation phenomenon and the strength of weak ties property. Last, we take up the robustness of networks to node or edge failure and the aggregation of information over networks. The chapter concludes with a discussion of how networks influence model outcomes. Network Structure A network consists of nodes and edges that connect them. We refer to nodes connected by an edge as neighbors and to a network as connected if it is possible to get from any one node to any other along edges. Networks can be represented as graphs, as lists of edges, or as matrices of zeros and ones, where a one in row A and column B denotes an edge between node A and node B. Though people prefer graphical representations of networks, lists and matrices are better for representations for calculating network statistics. The edges in a network can be directed—that is, pointing from one node to another. In an information network, a directed edge denotes that one person gets information from another. In an ecosystem network, a directed edge from a red-tailed hawk to a gray squirrel represents that the hawk eats the squirrel. Edges can also be undirected. Edges that connect friends are drawn in this way. In an undirected network, the degree of a node equals the number of edges that connect to it. Networks are characterized by a set of network statistics. For each statistic, we can compute the network average and the distribution across all nodes. The average degree of a friendship network tells us, on average, how many friends each person has. The degree distribution tells us if some nodes are more connected than others. Social networks have more equal distributions than the World Wide Web, the internet, and citation networks, all of which have long tails. Network Statistics Degree: The number of neighbors (also the number of edges) of a node. Path length: The minimum number of edges that must be traversed to get from one node to another. Betweenness: The number of paths of minimal length connecting two other nodes that pass through a node. Clustering coefficient: The percentage of a node’s pairs of neighbors that are also connected by a edge. Path length, the minimal distance between two nodes, varies inversely with degree. As we add edges, we shorten the average length between nodes. In an airline flight network, path length corresponds to the number of flights, on average, a person needs to take to get from one city in the network to another. Given a choice between two airlines, all else equal (namely, prices), a traveler would prefer the one with lower average path length. Average path length also correlates with information loss. Information that passes through several people is more likely to suffer distortion than information passed between only two people. The nodes on minimal paths play critical roles in networks. If information takes the shortest route, then it goes through the nodes on a minimal path. A node’s betweenness score equals the percentage of minimal paths that go through a node. In a social network, people with high betweenness scores know more information and wield more power. The final statistic, the clustering coefficient, equals the proportion of a node’s pairs of neighbors who are also neighbors of one another. For example, a person with 10 friends has 45 pairs of friends. If 15 of those 45 pairs are themselves friends, then the person’s clustering coefficient equals . If all 45 friendships existed, then the person’s clustering coefficient would equal 1, the maximal possible value. The clustering coefficient for the entire network equals the average of the clustering coefficients of the individual nodes. image Figure 10.1: A Hub-and-Spoke Network and a Geographic Network Figure 10.1 shows two networks with thirteen nodes: a hub-and-spoke network and a geographic network. In the hub-and-spoke network, the hub has degree 12 and all other nodes have degree 1, for an average degree of less than 2. The degree distribution is unequal. The hub has a distance of 1 to every node. All other nodes have a distance of 1 to the hub and a distance of 2 to the other nodes. It follows that average path length is also less than 2. The hub, which lies on every minimal path between any two other nodes, has a betweenness score of 1. The spoke nodes do not lie on any minimal paths connecting other nodes, so they have a betweenness of 0. Finally, in the hub-and-spoke network, no nodes connected to a node are connected to one another. Therefore, the network has a clustering coefficient of zero. In the geographic network, each node is connected to the two nodes to its right and its left, so the average degree equals 4. Each node is distance 1 from four nodes, distance 2 from four nodes, and distance 3 from four nodes. So the average distance equals exactly 2. The degree and distance distributions for this graph are degenerate—every node has the same degree and the same average distance. It can be shown that the betweenness of each node equals .2 Each node has four neighbors, creating six pairs. Of those six pairs, exactly three are connected: the two nodes to the immediate left and right are each connected to an outer node and to each other. Therefore, the clustering coefficient equals . An alternative method for capturing clustering is to partition the nodes into communities. In a junior high friendship network, the communities might correspond to teenagers interested in the arts, athletics, or science. Or they could be defined by race and gender. A network of political alliances might partition into regional or ideological allies. Multiple methods exist for determining communities. One approach sequentially removes edges with the highest betweenness, as edges with high betweenness are more likely to connect distinct clusters. Other approaches take the number of communities as given and seek an optimal partitioning given an objective function such as minimizing the number of edges between the communities or maximizing the proportion of edges within communities.3 We can use community detection algorithms to ask questions of network data. Studies show that people may reside in online bubbles. That is, we may belong to communities of people who get their news from similar sources. If so, that has implications for social cohesion. Prior to the creation of the internet, that may have been true as well, but demonstrating it with data would have been hard. Now data scientists can scrape the web to identify the news sources that people frequent and tell us that, yes, in fact we do live in bubbles to an extent. Models provide the formal definitions of communities. Data tells us the strength of those communities. Using judgment we can make wise inferences based on what the data say. Common Network Structures In analyzing networks, we encounter a problem of variety. A handful of network statistics are incapable of pinning down the specific network structure: one can construct billions of distinct networks with ten nodes and an average degree of 2. An alternative approach to characterizing a network is to test whether its statistical measures differ significantly from those of a common network structure. For example, a scholar might gather data on judicial citations and put it in network form by drawing an edge when one judge cites another judge’s opinions. The graph of that network may appear to possess interesting structures and clusters. We can test whether a network is random by comparing the network’s statistics to those of a random network that has the same number of nodes and edges. A random network clustering coefficient equals the probability of a random edge because two neighbors of a node are no more likely to contain an edge than any other randomly chosen node. Monte Carlo Method for Random Networks To test whether a network with N nodes and E edges is random, we create a large number of random networks with N nodes and E edges and calculate distributions for degree, path length, clustering coefficient, and betweenness. We then perform standard statistical tests to accept or reject the hypothesis that the network’s statistics could have been drawn from the simulated distributions.4 Theoretical models often assume a particular network structure. Many assume random networks, while others assume regular geographic networks such as when the nodes are arranged in a circle and each is connected to the nearest nodes in each direction. Other geographic networks arrange the nodes on a checkerboard and connect each node to its neighbors to the north, south, east, and west. Most of the common geographic networks have low degree—they connect to only the local neighbors—and relatively high average path length. On geographic networks, betweenness and clustering coefficient have no variation. A third common type of network, a power-law network, has a power-law degree distribution. A handful of nodes has many connections, but most nodes have very few networks. A fourth type of network, a small-world network, combines features of geographic and random networks.5 To construct a small-world network, we begin with a geographic network and then “rewire” it by randomly selecting an edge and replacing one of the nodes it connects with a random node. If the rewiring probability equals zero, we have a geographic network. If it equals 1, we have a random network. In between, we have a small-world network, distinguished by small clusters from the geographic network connected by random links to other clusters. Social networks look similar to small worlds. Each person has a cluster of friends as well as random friends. image Figure 10.2: Random, Geographic, Power-Law, and Small-World Networks Network Formation: Logic We now briefly describe models of network formation. These models provide logic to explain network structures. Most of the network structures that we encounter emerge from choices of individual actors to make connections. That is true of friendship networks, the World Wide Web, and power grids. These networks are not planned. Other networks, such as supply chain networks, do result from planning. We would expect planned networks to be robust to the failure of nodes. The fact that emergent network structures are robust is more of a puzzle. We have already discussed how to create random networks and smallworld networks. We create the former by randomly creating a set of nodes and then drawing edges connecting random pairs of nodes. We create a small-world network by first constructing a regular geographic network (often by arranging nodes in a circle and connecting k neighbors in each direction) and then randomly “rewiring” a proportion of the edges. Models of the formation of the power grid rely on economic and engineering principles. The network must deliver power to homes, businesses, and the government. Whether the producers are for-profit companies or public utilities, they have little incentive to create high clustering, as it would be inefficient. This lack of clusters reduces the robustness of the network. Economic and engineering considerations also rule out long leaps: connections that reach far across the network. Power companies do not build direct connections from Chicago to Dallas. People and businesses, however, do. A Chicagoan might strike up a friendship with someone from Dallas. A firm in Singapore might trade with a firm in Detroit. As we see in the next section, these long leaps contribute to network robustness. To create a network with a long-tailed distribution, we can apply a version of the preferential attachment model. We create nodes randomly and then draw edges from new nodes to existing nodes. If we let the probability of connecting to a node be proportional to its degree, we produce a power-law degree distribution. In that model, early-arriving nodes will be far more likely to be of high degree. A shortcoming of the model is that it does not allow for any difference in node quality. Higher quality nodes should have higher degree. The quality and degree network formation model corrects that omission while also producing a long-tailed distribution. Quality and Degree Network Formation Model Create d disconnected nodes. In each period t create a new node with quality Qt drawn from a distribution F. Connect that node to d other nodes based on the degree of those nodes. If Dit denotes the degree of node i at time t, the probability of choosing node i given N nodes equals: image If the quality of new nodes has a low mean and low variance, the model resembles the standard preferential attachment model. If the quality distribution has a long tail, then new nodes of very high quality can grow to have large degree.6 Why Networks Matter: Function In Chapter 1, we mentioned the friendship paradox, the fact that on any network, on average, people cannot have more friends than their friends do. The logic for why this holds can be shown using the hub-and-spoke network. In that network, twelve people have one friend and one person has twelve friends. The twelve people with one friend are all connected to the hub, and the hub has twelve friends. That feature—the fact that high-degree people are connected to more people—drives the result. On the hub network, people, on average, have fewer than two friends. Yet, on average, each person’s friends have more than eleven friends. The friendship paradox holds for any network: academic citation networks, email networks, sexual contact networks, banking networks, and international trade networks. On average, the references cited by an academic article receive more citations than the article itself; a country’s trading partners, on average, trade with more countries than the country itself; and the multiple species connected to a single species in a food network have, on average more connections than the single species itself. The disparity between the number of friends and the number of friends of friends becomes more pronounced on networks with degree distributions that are more dispersed. One analysis of friendships on Facebook found that the average person has around two hundred friends and their friends, on average, have more than six hundred friends.7 The Friendship Paradox If any two nodes in a network differ in their degree, on average a node has lower degree than its neighbors. In other words, on average, people’s friends are more popular than they are.8 The logic of the friendship paradox extends to any attribute that correlates with the number of friends. If active, happy, intelligent, wealthy, and kind people have, on average, more friends, then a person’s friends will be, on average, more active, happier, more intelligent, wealthier, and more beautiful.9 Imagine a network in which 90% of unhappy people have four friends and 10% have ten friends. Reverse the proportions for happy people: 10% have four friends and 90% have ten friends. People’s friends will disproportionately consist of people with ten friends. A large majority of those people will be happy, so most people’s friends will be happier than they are. We now show the Six Degrees of Separation phenomenon, the claim that any two people on the earth can be connected through six friends or fewer. While the friendship paradox holds for any network, six degrees of separation only holds for some types of networks. The phenomenon’s name derives from an experiment carried out by Stanley Milgram in the 1960s. Milgram sent packets to 296 individuals in Omaha, Nebraska, and Wichita, Kansas, that were to be forwarded to an individual in Boston, Massachusetts. The recipients had to follow the same rules. These participants were only allowed to send the packets via mail to people that they knew personally and whom they believed might have a greater chance of knowing the target person in Boston, with instructions to do the same. Individuals signed a roster to record the path and mailed postcards to the researchers so the researchers could track breaks in the chain. Sixty-four of the letters arrived in Boston. Of those that did, the average path length was slightly less than six, hence the phrase “six degrees of separation.” A second experiment run fifty years later on a much grander scale using email created eighteen global targets and sent them to more than 20,000 people. The median path length of email chains was between five and seven, depending on the geographic distance between the source and the target. The length of paths found does not equal the minimal path length between participants. The evidence therefore suggests that most people are linked by fewer than six degrees.10 We construct a simplified version of the small-world network to give intuition about the six degrees of separation phenomenon. Our version assumes that individuals have a small cluster of clique friends, who all know one another, and that they also have friends outside of those cliques, whom we call random friends.11 Figure 10.3 shows an individual (denoted by the black circle) with five clique friends and two random friends. It also shows a selection of friends of the node’s friends (light gray circles). image Figure 10.3: A Node’s Clique Friends (C) and Random Friends (R) These random friends might also be thought of as weak ties—people who connect you to other communities of people. Our weak ties, the random friends in our network, play an important informational role by connecting communities with diverse interests and information. Hence, sociologists speak of the strength of weak ties.12 This construction allows us to calculate the number of neighbors of degree two (the friends of friends), by adding up all of the friends of random friends but only adding the random friends of the clique friends. We do not count the clique friends of the clique friends, as they are members of the node’s clique. We calculate the number of friends of friends of friends similarly. We add in all of the clique friends’ random friends’ friends, but we do not add in a random friend’s clique friends’ clique friends, as they have already been counted as neighbors of degree two. To produce the six degrees of separation phenomenon, we apply the same logic to a network with 100 clique friends and 20 random friends. Six Degrees of Separation Assume each node has 100 clique friends (C), all of whom are friends with one another, and 20 random friends (R), who have no friends in common with the node. Degree one: C + R = 120 Degree two: CR + RC + RR = 2000 + 2000 + 400 = 4400 Degree three: CRC + CRR + RCR + RRC + RRR = 328,000 Degree four: 17, 360,000 13 Degree five: > 1 billion Degree six: > 20 billion By assuming no overlap in the friends of the random friends, the model implicitly assumes an infinite population. An actual social network will have overlap in friends as the degree increases. In a network that includes overlap and other realistic features such as heterogeneity in the number of friends, the values will differ from those calculated above. The relative magnitudes of the number of neighbors of each degree will remain similar. A person will have many more neighbors of degree three (friends of friends of friends) than of degree two (friends of friends). The large number of friends of degree three, over a quarter million in our example, can be consequential. Unlike a person’s clique friends, a person’s friends of degree three tend to live in different cities, attend different schools, and have different information. They are more diverse. They are also near enough for trust to be established: a friend of a friend of a friend could be your roommate’s mother’s coworker, or your sister’s boyfriend’s aunt. The number of friends of degree three, their diversity, and their relative proximity make them an important asset. They can provide new information and job opportunities. These are the people most likely to help a person find a job, facilitate a move to a new city, or become a life or business partner. Network Robustness Our last implication of network structure evaluates the robustness of network properties, or how close the network is to node (or edge) failure. The most essential property of a network is whether it remains connected. We can use models to calculate the probability that a network remains connected as a function of the number of nodes removed. We could also ask what happens to average path length as nodes are removed. Applied to an airline network, an analysis of path length robustness would tell us how many extra flights would be needed if an airport were to shut down due to weather or a power failure. Here, we consider the question of how the size of the largest connected component of the network, the giant component, changes as nodes randomly fail. Figure 10.4 shows the size of the giant component for a large random network and a large small-world network. In the random network, the size of the giant component falls linearly at first. At a critical value where the probability of an edge equals 1 divided by the number of nodes, the size of the largest component falls to an arbitrarily small proportion of the original network size. The small-world network shows no such abrupt change. A majority of connections exist within the geographic clusters. Each cluster can withstand the failure of multiple nodes. This feature combined with the random links prevents the entire network from collapsing. image Figure 10.4: Size of the Giant Component (G) as a Function of Node Failure From figure 10.4 we can infer that sparse networks that lack local clustering are susceptible to failure. We can apply that insight to the power grid. It lacks the long leaps and the tight clusters that make the small-world network robust. In a power grid, the failure of a node or a link cannot be overcome by other links in a cluster or by a long connection to a working node far away. Local failures can cascade through the network.14 In contrast, the internet, which has a long-tailed degree distribution, is robust to random node failure. The degree distribution implies that the vast majority of nodes have few connections. Even if they fail, the network remains connected. Up to now, we have assumed random node failure. We can also consider strategic node removal. Networks with long tails, like the internet, now become non-robust. Strategic removal of the nodes of highest degree destroys the network. The logic can be seen by considering the hub-andspoke network. When nodes are removed randomly, the network remains connected unless the hub node is removed, a low-probability event. Strategic removal, wiping out the hub, disconnects the network in one step. With some networks, such as terrorist networks and drug supply networks, we might want to disconnect a network. If those networks are sparse, like the power grid, or have a long-tailed degree distribution, they can be disconnected through strategic node removal. For the terrorist network, this would entail arresting the most connected members. If those networks resemble small worlds, they will be robust, even to the strategic removal of nodes. Attempts to cut off any geographic segment of the network will fail because of the random reconnections that connect the segment to the rest of the network. Summary When we build network models of people, we often do so to capture social influences, where the success, behavior, information, or beliefs of a person in the network influence the success, behavior, information, or beliefs of their friends. Behavior can be contextual or intrinsic; so too can a person’s value or contribution to a collective enterprise. A person’s value or contribution could be due to properties of that person, such as her brilliance, her effort level, or her good fortune. A person’s success could also be due to his network of friends and colleagues. This is an age-old question: Does success depend on what you know or whom you know? Imagine a group of scientists working together in a research lab. They share advice, ideas, and knowledge. The number of academic papers, patents, or scientific breakthroughs produced by a scientist depends on what she knows, but it can also be influenced by whom she knows, on her interactions with other scientists. By thinking in terms of contextual features (friendship networks) as well as intrinsic attributes (individual abilities), we can determine how much of a scientist’s success to attribute to each. Investment firms that hire away superstar fund managers based on the belief that investment success depends mostly on talent have not had very promising results. Empirical evidence shows that top investors also depend on networks of colleagues who provide them with specific types of information.15 That specific finding can be viewed within the lens of a much larger literature (some of which is model-based) showing how a person’s position in an organization influences success. Success still correlates with ability. A business idea that makes its investors millions was probably a good one. A scientist who publishes hundreds of papers and receives numerous awards has high ability. At the same time, those best positioned in the network make the largest contributions. We can measure a person’s position using betweenness and other measures of centrality. The people who occupy high-betweenness positions in a network fill what Ron Burt calls structural holes between communities, which we can identify using algorithms.16 Access to information and ideas from multiple communities gives people who fill structural holes power and influence. Filling a structural hole requires certain talents and abilities. A person cannot just jump in and fill any hole. She must build trust and understanding within each community. And she must be conversant in the knowledge base of each community. We can apply nearly identical logic to assess the value of firms and assign power to countries. We can see a firm’s value as intrinsic and take a balance sheet perspective by looking at assets and liabilities. We can also look at the context in which the firm operates, such as its position in the supply chain. Similarly, the power of a country depends on its resources and its alliances. For both firms and countries, intrinsic attributes and connectedness correlate. Those who occupy powerful positions in the network also possess important attributes. Our analysis as well as most of the literature considers the node as the unit of analysis. The edges can be critical as well. Taking an even broader perspective, the network itself may be an appropriate unit of analysis. For example, teacher networks that allow ideas and information to flow between classrooms can improve educational outcomes, and so a wellconnected administrator can effectively coordinate curriculum reform. Similarly, a second-grade teacher knows a lot of information about the students from his class who are going on to third grade; that information could help the third-grade teacher. A mathematics teacher knows what concepts students have yet to grasp; that information could help the science teacher structure her lessons. Good schools, therefore, have strong faculty networks. This is just one example of how network models can improve our thinking.17 Myerson Value and Burt’s Structural Holes People who fill structural holes connect communities in a network and have more influence. A variety of statistical measures of a network, such as betweenness, should correlate with occupying a structural hole. An alternative measure of influence in a network, Myerson value, relies on the logic of Shapley values. To compute Myerson values, we construct a cooperative game on a network but only allow coalitions that include connected components. Consider three individuals arranged in a line. Assume that their locations represent political ideologies with person B in the center, as shown below. If we restrict coalitions to immediate neighbors, then A, the left-most person, cannot connect to C, the right-most person, unless person B also belongs to the coalition. To compute each player’s Myerson value, we first assign added values to all feasible coalitions. We then compute Shapley values for each possible coalition, treating each as a distinct game. Last, we add up the Shapley values for each coalition game to obtain Myerson values. image As an example, suppose that any coalition of two players produces an output of value 10 and all three players together produce output of value 14, we obtain the following: Players 1 and 3 have Myerson values of 3, and Player 2 has a Myerson value of 8.18 Centrality measures such as betweenness are based only on the network. Myerson values depend on a value function. By having both measures we can disentangle the dependence of power on a person’s position in the network and on the functions she performs. In our example, the Myerson values for the three players, (3, 8, 3), correlate perfectly with their betweenness scores (0, 1, 0). That will not always be the case, particularly for more complicated networks and value functions.