# Running a chain with ReCom¶

This document shows how to run a chain using the ReCom proposal used in MGGG’s 2018 Virginia House of Delegates report.

Our goal is to use ReCom to generate an ensemble of districting plans for Pennsylvania, and then make a box plot comparing the Democratic vote shares for plans in our ensemble to the 2011 districting plan that the Pennsylvania Supreme Court found to be a Republican-favoring partisan gerrymander.

This code is also available as a Jupyter notebook.

You can run this example in an interactive Jupyter Notebook session in your browser, without installing anything, using Binder:

## Imports¶

The first step is to import everything we’ll need:

```
import matplotlib.pyplot as plt
from gerrychain import (GeographicPartition, Partition, Graph, MarkovChain,
proposals, updaters, constraints, accept, Election)
from gerrychain.proposals import recom
from functools import partial
import pandas
```

## Setting up the initial districting plan¶

We’ll create our graph using the Pennsylvania shapefile from MGGG-States (download the .json here if you didn’t start with the Getting started with GerryChain guide):

```
graph = Graph.from_json("./PA_VTDs.json")
```

We configure `Election`

objects representing some of the election
data from our shapefile.

```
elections = [
Election("SEN10", {"Democratic": "SEN10D", "Republican": "SEN10R"}),
Election("SEN12", {"Democratic": "USS12D", "Republican": "USS12R"}),
Election("SEN16", {"Democratic": "T16SEND", "Republican": "T16SENR"}),
Election("PRES12", {"Democratic": "PRES12D", "Republican": "PRES12R"}),
Election("PRES16", {"Democratic": "T16PRESD", "Republican": "T16PRESR"})
]
```

### Configuring our updaters¶

We want to set up updaters for everything we want to compute for each plan in the ensemble.

```
# Population updater, for computing how close to equality the district
# populations are. "TOTPOP" is the population column from our shapefile.
my_updaters = {"population": updaters.Tally("TOTPOP", alias="population")}
# Election updaters, for computing election results using the vote totals
# from our shapefile.
election_updaters = {election.name: election for election in elections}
my_updaters.update(election_updaters)
```

### Instantiating the partition¶

We can now instantiate the initial state of our Markov chain, using the 2011 districting plan:

```
initial_partition = GeographicPartition(graph, assignment="CD_2011", updaters=my_updaters)
```

`GeographicPartition`

comes with built-in `area`

and `perimeter`

updaters.
We do not use them here, but they would allow us to compute compactness scores like Polsby-Popper
that depend on these measurements.

## Setting up the Markov chain¶

### Proposal¶

First we’ll set up the ReCom proposal. We need to fix some parameters using functools.partial before we can use it as our proposal function.

```
# The ReCom proposal needs to know the ideal population for the districts so that
# we can improve speed by bailing early on unbalanced partitions.
ideal_population = sum(initial_partition["population"].values()) / len(initial_partition)
# We use functools.partial to bind the extra parameters (pop_col, pop_target, epsilon, node_repeats)
# of the recom proposal.
proposal = partial(recom,
pop_col="TOTPOP",
pop_target=ideal_population,
epsilon=0.02,
node_repeats=2
)
```

### Constraints¶

To keep districts about as compact as the original plan, we bound the number of cut edges at 2 times the number of cut edges in the initial plan.

```
compactness_bound = constraints.UpperBound(
lambda p: len(p["cut_edges"]),
2*len(initial_partition["cut_edges"])
)
pop_constraint = constraints.within_percent_of_ideal_population(initial_partition, 0.02)
```

### Configuring the Markov chain¶

```
chain = MarkovChain(
proposal=proposal,
constraints=[
pop_constraint,
compactness_bound
],
accept=accept.always_accept,
initial_state=initial_partition,
total_steps=1000
)
```

## Running the chain¶

Now we’ll run the chain, putting the sorted Democratic vote percentages directly
into a `pandas`

`DataFrame`

for analysis and plotting. The `DataFrame`

will have a row for each state of the chain. The first column of the `DataFrame`

will
hold the lowest Democratic vote share among the districts in each partition in the chain, the
second column will hold the second-lowest Democratic vote shares, and so on.

```
# This will take about 10 minutes.
data = pandas.DataFrame(
sorted(partition["SEN12"].percents("Democratic"))
for partition in chain
)
```

If you install the `tqdm`

package, you can see a progress bar
as the chain runs by running this code instead:

```
data = pandas.DataFrame(
sorted(partition["SEN12"].percents("Democratic"))
for partition in chain.with_progress_bar()
)
```

## Create a plot¶

Now we’ll create a box plot similar to those appearing the Virginia report.

```
fig, ax = plt.subplots(figsize=(8, 6))
# Draw 50% line
ax.axhline(0.5, color="#cccccc")
# Draw boxplot
data.boxplot(ax=ax, positions=range(len(data.columns)))
# Draw initial plan's Democratic vote %s (.iloc[0] gives the first row)
plt.plot(data.iloc[0], "ro")
# Annotate
ax.set_title("Comparing the 2011 plan to an ensemble")
ax.set_ylabel("Democratic vote % (Senate 2012)")
ax.set_xlabel("Sorted districts")
ax.set_ylim(0, 1)
ax.set_yticks([0, 0.25, 0.5, 0.75, 1])
plt.show()
```

There you go! To build on this, here are some possible next steps:

- Add, remove, or tweak the constraints
- Use a different proposal from GerryChain, or create your own
- Perform a similar analysis on a different districting plan for Pennsylvania
- Perform a similar analysis on a different state
- Compute partisan symmetry scores like Efficiency Gap or Mean-Median, and create a histogram of the scores of the ensemble.
- Perform the same analysis using a different election than the 2012 Senate election
- Collect Democratic vote percentages for
*all*the elections we set up, instead of just the 2012 Senate election.