========================== Running a chain with ReCom ========================== Our goal now is to get a handle on using GerryChain to run Markov chains with both regular and region-aware settings. Throughout this guide, we'll use the toy state of GerryMandria, which has needs to be divided into 8 districts. The legislature of the state of GerryMandria has provided us with the following districting plan: .. image:: ./images/gerrymandria.png :width: 400px :align: center which has corresponding dual graph given by: .. image:: ./images/gerrymandria_district.png :width: 400px :align: center ``gerrychain`` works primarily with the dual graph of the districting plan, so all of the pictures in this guide will use the dual graph as well. A Simple Recom Chain ==================== .. raw:: html

Download GerryMandria File

Let us start by running a simple ReCom chain on this districting plan. Of course, the first thing to do is to import the required packages: .. code-block:: python import matplotlib.pyplot as plt from gerrychain import (Partition, Graph, MarkovChain, updaters, constraints, accept) from gerrychain.proposals import recom from gerrychain.constraints import contiguous from functools import partial import pandas # Set the random seed so that the results are reproducible! import random random.seed(2024) Now we set up the initial partition: .. code-block:: python graph = Graph.from_json("./gerrymandria.json") my_updaters = { "population": updaters.Tally("TOTPOP"), "cut_edges": updaters.cut_edges } initial_partition = Partition( graph, assignment="district", updaters=my_updaters ) And we make the proposal: .. code-block:: python # This should be 8 since each district has 1 person in it. # Note that the key "population" corresponds to the population updater # that we defined above and not with the population column in the json file. ideal_population = sum(initial_partition["population"].values()) / len(initial_partition) proposal = partial( recom, pop_col="TOTPOP", pop_target=ideal_population, epsilon=0.01, node_repeats=2 ) We can now set up the chain: .. code-block:: python recom_chain = MarkovChain( proposal=proposal, constraints=[contiguous], accept=accept.always_accept, initial_state=initial_partition, total_steps=40 ) and run it with .. code-block:: python assignment_list = [] for i, item in enumerate(recom_chain): print(f"Finished step {i+1}/{len(recom_chain)}", end="\r") assignment_list.append(item.assignment) We'll go ahead an collect the assignment at each step of the chain so that we can watch the chain work in a fun animation (of course, it would be a bad idea to do this for a chain with a large number of steps). .. code-block:: python %matplotlib inline import matplotlib_inline.backend_inline import matplotlib.cm as mcm import matplotlib.pyplot as plt import networkx as nx from PIL import Image import io import ipywidgets as widgets from IPython.display import display, clear_output frames = [] for i in range(len(assignment_list)): fig, ax = plt.subplots(figsize=(8,8)) pos = {node :(data['x'],data['y']) for node, data in graph.nodes(data=True)} node_colors = [mcm.tab20(int(assignment_list[i][node]) % 20) for node in graph.nodes()] node_labels = {node: str(assignment_list[i][node]) for node in graph.nodes()} nx.draw_networkx_nodes(graph, pos, node_color=node_colors) nx.draw_networkx_edges(graph, pos) nx.draw_networkx_labels(graph, pos, labels=node_labels) plt.axis('off') buffer = io.BytesIO() plt.savefig(buffer, format='png') buffer.seek(0) image = Image.open(buffer) frames.append(image) plt.close(fig) def show_frame(idx): clear_output(wait=True) display(frames[idx]) slider = widgets.IntSlider(value=0, min=0, max=len(frames)-1, step=1, description='Frame:') slider.layout.width = '500px' widgets.interactive(show_frame, idx=slider) And this should generate a little widget that you can move through to see the chain in action! Here is a gif of what it should look like: .. image:: ./images/gerrymandria_ensemble.gif :width: 400px :align: center Region-Aware ReCom ================== Of course, in the state of GerryMandria, the legislature has decided that it would like to try to keep the municipality of Gerryville together in a single district. In fact, it would really prefer to keep all of the municipalities together if possible, and, as such any analysis that you do needs to be on a ensemble of districting plans that try to keep municipalities together. Here is a picture of the municipalities in GerryMandria: .. image:: ./images/gerrymandria_cities.png :width: 400px :align: center Fortunately, ``gerrychain`` has a built-in functionality that allows for region-aware ReCom chains which create ensembles of districting plans that try to keep particular regions of interest together. And it only takes one extra line of code: we simply update our proposal to include a ``region_surcharge`` which increases the importance of the edges within the municipalities. .. code-block:: python proposal = partial( recom, pop_col="TOTPOP", pop_target=ideal_population, epsilon=0.01, node_repeats=2, region_surcharge={"muni": 1.0}, ) And this will produce the following ensemble: .. image:: ./images/gerrymandria_region_ensemble.gif :width: 400px :align: center Now, the legislature of GerryMandria has decided that it would also like to try to keep the counties together as well. They mention to you that it would be nice to keep the municipalities together, but that it is more important to keep the water districts together. Here is a picture of the water districts in GerryMandria: .. image:: ./images/gerrymandria_water.png :width: 400px :align: center Notice that there is a river that seems to cut through the middle of the state, and so it is not going to be possible to keep all of the water districts together and all of the municipalities together in one plan. However, we can try to keep the water districts together as much as possible, and then, within those water districts, try to be sensitive to the boundaries of the municipalities. Again, this only requires for us to edit the ``region_surcharge`` parameter of the proposal .. code-block:: python proposal = partial( recom, pop_col="TOTPOP", pop_target=ideal_population, epsilon=0.01, node_repeats=2, region_surcharge={"muni": 0.2, "water": 0.8}, ) Since we are trying to be sensitive to multiple bits of information, we should probably also increase the length of our chain to make sure that we have time to mix properly. .. code-block:: python recom_chain = MarkovChain( proposal=proposal, constraints=[contiguous], accept=accept.always_accept, initial_state=initial_partition, total_steps=10000 ) Then, we can run the chain and look at the last 40 assignments in the ensemble .. image:: ./images/gerrymandria_water_muni_ensemble.gif :width: 400px :align: center Comparing the last map with the municipality and water district maps, we can see that the chain has done a pretty good job of keeping the water districts together while also being sensitive to the municipalities .. figure:: ./images/gerrymandria_water_and_muni_aware.png :width: 400px :align: center The last map in the ensemble from the 10000 step region-aware ReCom chain with surcharges of 0.2 for the municipalities and 0.8 for the water districts. .. raw:: html

How the Region Aware Implementation Works ----------------------------------------- When working with region-aware ReCom chains, it is worth knowing how the spanning tree of the dual graph is being split. Weights from the interval :math:`[0,1]` are randomly assigned to the edges of the graph and then the surcharges are applied to the edges in the graph that span different regions specified by the ``region_surcharge`` dictionary. So if we have ``region_surcharge={"muni": 0.2, "water": 0.8}``, then the edges that span different municipalities will be upweighted by 0.2 and the edges that span different water districts will be upweighted by 0.8. We then draw a minimum spanning tree using by greedily selecting the lowest-weight edges via Kruskal's algorithm. The surcharges on the edges helps ensure that the algorithm picks the edges interior to the region before it picks the edges that bridge different regions. This makes it very likely that each region is largely contained in a connected subtree attached to a bridge node. Thus, when we make a cut, the regions attached to the bridge node are more likely to be (mostly) preserved in the subtree on either side of the cut. In the implementation of :meth:`~gerrychain.tree.biparition_tree` we further bias this choice by deterministically cutting bridge edges first (when possible). In the event that multiple types of regions are specified, the surcharges are added together, and edges are selected first by the number of types of regions that they span, and then by the surcharge added to those weights. So, if we have a region surcharge dictionary of ``{"a": 1, "b": 4, "c": 2}`` then we we look for edges according to the order - ("a", "b", "c") - ("b", "c") - ("a", "b") - ("a", "c") - ("b") - ("c") - ("a") - random where the tuples indicate that a desired cut edge bridges both types of region in the tuple. In the event that this is not the desired behaviour, then the user can simply alter the ``cut_choice`` function in the constraints to be different. So, if the user would prefer the cut edge to be a random edge with no deference to bridge edges, then they might use ``random.choice()`` in the following way: .. code-block:: python proposal = partial( recom, pop_col="TOTPOP", pop_target=ideal_population, epsilon=0.01, node_repeats=1, region_surcharge={ "muni": 2.0, "water_dist": 2.0 }, method = partial( bipartition_tree, cut_choice = random.choice, ) ) **Note**: When ``region_surcharge`` is not specified, ``bipartition_tree`` will behave as if ``cut_choice`` is set to ``random.choice``. .. .. attention:: .. The ``region_surcharge`` parameter is a dictionary that assigns a surcharge to each .. edge within a particular region that is determined by the keys of the dictionary. .. In the event that multiple regions are specified, the surcharges are added together, .. and if the surcharges add to more than 1, then the following warning will be printed .. to the user: .. .. code-block:: console .. ValueWarning: .. The sum of the surcharges in the surcharge dictionary is greater than 1. .. Please consider normalizing the surcharges. .. It is generally inadvisable to set the surcharge of a region to 1 or more. When .. using :meth:`~gerrychain.proposals.recom` with a ``region_surcharge``, the proposal .. will try to draw a minimum spanning tree using Kruskal's algorithm where, .. the surcharges are in the range :math:`[0,1]`, then the surcharges from the surcharge .. dictionary are added to them. In the event that .. many edges within the tree have a surcharge above 1, then it can sometimes .. cause the bipartitioning step to stall. What to do if the Chain Gets Stuck ================================== Sometimes, either because of the constraints that you have imposed or because of the shape of the graph that you are working with, a recom chain can get stuck and will throw an error. For example, if we try to be a bit too demanding of the region-aware chain given above and ask for a plan that effectively never splits a municipality nor a water district, then the chain will get stuck and throw an error. Here is the setup: .. code-block:: python from gerrychain import (Partition, Graph, MarkovChain, updaters, constraints, accept) from gerrychain.proposals import recom from gerrychain.tree import bipartition_tree from gerrychain.constraints import contiguous from functools import partial import random random.seed(5) graph = Graph.from_json("./gerrymandria.json") my_updaters = { "population": updaters.Tally("TOTPOP"), "cut_edges": updaters.cut_edges } initial_partition = Partition( graph, assignment="district", updaters=my_updaters ) ideal_population = sum(initial_partition["population"].values()) / len(initial_partition) proposal = partial( recom, pop_col="TOTPOP", pop_target=ideal_population, epsilon=0.01, node_repeats=1, region_surcharge={ "muni": 2.0, "water_dist": 2.0 }, method = partial( bipartition_tree, max_attempts=100, ) ) recom_chain = MarkovChain( proposal=proposal, constraints=[contiguous], accept=accept.always_accept, initial_state=initial_partition, total_steps=20, ) assignment_list = [] for i, item in enumerate(recom_chain): print(f"Finished step {i + 1}/{len(recom_chain)}", end="\r") assignment_list.append(item.assignment) This will output the following sequence of warnings and errors .. code-block:: console BipartitionWarning: Failed to find a balanced cut after 50 attempts. If possible, consider enabling pair reselection within your MarkovChain proposal method to allow the algorithm to select a different pair of nodes to try an recombine. RuntimeError: Could not find a possible cut after 100 attempts. Let's break down what is happening in each of these: .. raw:: html

BipartitionWarning This is telling us that somewhere along the way, we picked a pair of districts that were difficult to bipartition underneath the constraints that we have imposed. More accurately, for the pair of districts that we have selected to recombine, we have selected a root node for a spanning tree, and we are trying to find a cut at some point along that tree that satisfies all of the conditions. We have tried to draw a tree 50 times and have failed to find a balanced cut of any of the trees starting from the selected root node. This indicates that either we have selected a difficult node to start from, or that the pair of districts we are considering is difficult to split regardless of the choice of root node. If the problem is the choice of root node, we can fix it by increasing the node_repeats parameter of the MarkovChain. However, if the problem is that the pair of districts themselves are difficult to split, then this can generally only be fixed by allowing the chain to reselect the pair of districts that it is trying to split.

RuntimeError This is telling us that we have tried to draw a tree 10000 times for each node that we have selected, and that we failed to find a valid cut in all of them. This is a pretty strong indication that the pair of districts that we are trying to split is just too difficult to split and that we need to enable reselection.

Okay, let's see if we can fix this. First, we'll try to increase the number of node repeats: .. code-block:: python random.seed(5) graph = Graph.from_json("./gerrymandria.json") my_updaters = { "population": updaters.Tally("TOTPOP"), "cut_edges": updaters.cut_edges } initial_partition = Partition( graph, assignment="district", updaters=my_updaters ) ideal_population = sum(initial_partition["population"].values()) / len(initial_partition) proposal = partial( recom, pop_col="TOTPOP", pop_target=ideal_population, epsilon=0.01, node_repeats=100, # <-- This is the only change region_surcharge={ "muni": 2.0, "water_dist": 2.0 }, method = partial( bipartition_tree, max_attempts=100, ) ) recom_chain = MarkovChain( proposal=proposal, constraints=[contiguous], accept=accept.always_accept, initial_state=initial_partition, total_steps=20, ) assignment_list = [] for i, item in enumerate(recom_chain): print(f"Finished step {i + 1}/{len(recom_chain)}", end="\r") assignment_list.append(item.assignment) Running this code, we can see that we get stuck once again, so this was not the fix. Let's try to enable reselection instead: .. code-block:: python random.seed(5) graph = Graph.from_json("./gerrymandria.json") my_updaters = { "population": updaters.Tally("TOTPOP"), "cut_edges": updaters.cut_edges } initial_partition = Partition( graph, assignment="district", updaters=my_updaters ) ideal_population = sum(initial_partition["population"].values()) / len(initial_partition) proposal = partial( recom, pop_col="TOTPOP", pop_target=ideal_population, epsilon=0.01, node_repeats=1, region_surcharge={ "muni": 2.0, "water_dist": 2.0 }, method = partial( bipartition_tree, max_attempts=100, allow_pair_reselection=True # <-- This is the only change ) ) recom_chain = MarkovChain( proposal=proposal, constraints=[contiguous], accept=accept.always_accept, initial_state=initial_partition, total_steps=20, ) assignment_list = [] for i, item in enumerate(recom_chain): print(f"Finished step {i + 1}/{len(recom_chain)}", end="\r") assignment_list.append(item.assignment) And this time it works! A Real-World Example ==================== In this example, we'll use GerryChain to analyze the 2011 districting plan for Pennsylvania's state legislative districts. We'll compare the partisan vote shares in the 2011 plan to those in an ensemble of districting plans generated by our ReCom chain. Imports ------- As always, the first step is to import everything we need .. code-block:: python import matplotlib.pyplot as plt from gerrychain import (GeographicPartition, Partition, Graph, MarkovChain, proposals, updaters, constraints, accept, Election) from gerrychain.proposals import recom from functools import partial import pandas Setting up the initial districting plan --------------------------------------- .. raw:: html

Download PA File

We'll create our graph using the example Pennsylvania json file. .. code-block:: python graph = Graph.from_json("./PA_VTDs.json") We may now configure :class:`~gerrychain.Election` objects representing some of the election data from our file. .. code-block:: python elections = [ Election("SEN10", {"Democratic": "SEN10D", "Republican": "SEN10R"}), Election("SEN12", {"Democratic": "USS12D", "Republican": "USS12R"}), Election("SEN16", {"Democratic": "T16SEND", "Republican": "T16SENR"}), Election("PRES12", {"Democratic": "PRES12D", "Republican": "PRES12R"}), Election("PRES16", {"Democratic": "T16PRESD", "Republican": "T16PRESR"}) ] Configuring our updaters ++++++++++++++++++++++++ We want to set up updaters for everything we want to compute for each plan in the ensemble. In this case, we want to keep track of the population of each district and election info for each of our previously defined elections. .. code-block:: python # Population updater, for computing how close to equality the district # populations are. "TOTPOP" is the population column from our shapefile. my_updaters = {"population": updaters.Tally("TOT_POP", alias="population")} # Election updaters, for computing election results using the vote totals # from our shapefile. election_updaters = {election.name: election for election in elections} my_updaters.update(election_updaters) Instantiating the partition +++++++++++++++++++++++++++ We can now instantiate the initial state of our Markov chain, using the 2011 districting plan .. code-block:: python initial_partition = GeographicPartition( graph, assignment="2011_PLA_1", updaters=my_updaters ) The class :class:`~gerrychain.GeographicPartition` comes with built-in ``area`` and ``perimeter`` updaters. We do not use them here since (i) the \*.json file that we are working with does not have geometric information and (ii) geometric updaters tend to slow the chain quite considerably (and this is just an example), but they would allow us to compute compactness scores like Polsby-Popper that depend on these measurements. Setting up the Markov chain --------------------------- Proposal ++++++++ First we'll set up the ReCom proposal. To do this we will need to make use of the python `functools`_ package, specifically the ``partial`` function within this package. .. admonition:: Use of ``functools.partial`` :class: note For the uninitiated, the ``functools.partial`` function allows us to create a new function from an existing function by binding the values of some of the arguments. For example, we might have a function to make a colored square: .. code-block:: python from PIL import Image def make_color_square(red_val, green_val, blue_val): img = Image.new('RGB', (100, 100), color = (red_val, green_val, blue_val)) return img And we can then use this to make a new function that always makes a blue square: .. code-block:: python make_blue_square = partial(make_color_square, red_val=0, green_val=0) make_color_square(red_val=255, green_val=0, blue_val=0).show() # Makes a red square make_blue_square(blue_val=255).show() # Makes a blue square Back to Recom, we need to fix some parameters using `functools.partial` before we can use it as our proposal function. .. code-block:: python # The ReCom proposal needs to know the ideal population for the districts so that # we can improve speed by bailing early on unbalanced partitions. ideal_population = sum(initial_partition["population"].values()) / len(initial_partition) # We use functools.partial to bind the extra parameters (pop_col, pop_target, epsilon, node_repeats) # of the recom proposal. proposal = partial( recom, pop_col="TOT_POP", pop_target=ideal_population, epsilon=0.02, node_repeats=2 ) Constraints +++++++++++ To keep districts about as compact as the original plan, we would like to constrain the number of cut edges between all of the districts (this will keep our districts from being too snake-like). We can do this using the :class:`~gerrychain.constraints.UpperBound` constraint, and, as a general heuristic, we'll bound the number of cut edges by twice the number of cut edges in the initial plan. .. code-block:: python def cut_edges_length(p): return len(p["cut_edges"]) compactness_bound = constraints.UpperBound( cut_edges_length, 2*len(initial_partition["cut_edges"]) ) pop_constraint = constraints.within_percent_of_ideal_population(initial_partition, 0.02) .. admonition:: Coding Note :class: note We can simplify the calling of this compactness bound using lambda functions. .. code-block:: python compactness_bound = constraints.UpperBound( lambda p: len(p["cut_edges"]), 2*len(initial_partition["cut_edges"]) ) The use of lambda functions tends to be a more advanced coding technique, but the benefit is that we do not need to define a new function for each constraint that we want to use, and they can make the code more readable. Configuring the Markov chain ++++++++++++++++++++++++++++ .. code-block:: python chain = MarkovChain( proposal=proposal, constraints=[ pop_constraint, compactness_bound ], accept=accept.always_accept, initial_state=initial_partition, total_steps=1000 ) Running the chain ----------------- Now we'll run the chain, putting the sorted Democratic vote percentages directly into a :mod:`pandas` :class:`~pandas.DataFrame` for analysis and plotting. The ``DataFrame`` will have a row for each state of the chain. The first column of the ``DataFrame`` will hold the lowest Democratic vote share among the districts in each partition in the chain, the second column will hold the second-lowest Democratic vote shares, and so on. .. code-block:: python # This might take a few minutes. data = pandas.DataFrame( sorted(partition["SEN12"].percents("Democratic")) for partition in chain ) If you are wondering what the ``for`` loop inside of the parentheses is doing, please see the `this note <./quickstart.html#list-comprehension>`_. If you install the ``tqdm`` package, you can see a progress bar as the chain runs by running this code instead .. code-block:: python data = pandas.DataFrame( sorted(partition["SEN12"].percents("Democratic")) for partition in chain.with_progress_bar() ) Create a plot ------------- Now we'll create a box plot to help visualize the data report. .. code-block:: python fig, ax = plt.subplots(figsize=(8, 6)) # Draw 50% line ax.axhline(0.5, color="#cccccc") # Draw boxplot data.boxplot(ax=ax, positions=range(len(data.columns))) # Draw initial plan's Democratic vote %s (.iloc[0] gives the first row) plt.plot(data.iloc[0], "ro") # Annotate ax.set_title("Comparing the 2011 plan to an ensemble") ax.set_ylabel("Democratic vote % (Senate 2012)") ax.set_xlabel("Sorted districts") ax.set_ylim(0, 1) ax.set_yticks([0, 0.25, 0.5, 0.75, 1]) plt.show() .. image:: ./images/recom_plot.svg There you go! To build on this, here are some possible next steps: * Add, remove, or tweak the constraints * Perform a similar analysis on a different districting plan for Pennsylvania * Perform a similar analysis on a different state * Compute partisan symmetry scores like Efficiency Gap or Mean-Median, and create a histogram of the scores of the ensemble. * Perform the same analysis using a different election than the 2012 Senate election * Collect Democratic vote percentages for *all* the elections we set up, instead of just the 2012 Senate election. .. _functools: https://docs.python.org/3/library/functools.html