API Reference

Adjacency graphs

class gerrychain.Graph(incoming_graph_data=None, **attr)[source]

Represents a graph to be partitioned. It is based on networkx.Graph.

We have added some classmethods to help construct graphs from shapefiles, and to save and load graphs as JSON files.

Initialize a graph with edges, name, or graph attributes.

incoming_graph_data : input graph (optional, default: None)
Data to initialize graph. If None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object. If the corresponding optional Python packages are installed the data can also be a NumPy matrix or 2d ndarray, a SciPy sparse matrix, or a PyGraphviz graph.
attr : keyword arguments, optional (default= no attributes)
Attributes to add to graph as key=value pairs.

convert

>>> G = nx.Graph()  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G = nx.Graph(name='my graph')
>>> e = [(1, 2), (2, 3), (3, 4)]  # list of edges
>>> G = nx.Graph(e)

Arbitrary graph attribute pairs (key=value) may be assigned

>>> G = nx.Graph(e, day="Friday")
>>> G.graph
{'day': 'Friday'}
add_data(df, columns=None)[source]

Add columns of a DataFrame to a graph as node attributes using by matching the DataFrame’s index to node ids.

Parameters:
  • df – Dataframe containing given columns.
  • columns – (optional) List of dataframe column names to add.
classmethod from_file(filename, adjacency='rook', cols_to_add=None, reproject=True, ignore_errors=False)[source]

Create a Graph from a shapefile (or GeoPackage, or GeoJSON, or any other library that geopandas can read. See from_geodataframe() for more details.

Parameters:cols_to_add – (optional) The names of the columns that you want to add to the graph as node attributes. By default, all columns are added.
classmethod from_geodataframe(dataframe, adjacency='rook', reproject=True, ignore_errors=False)[source]

Creates the adjacency Graph of geometries described by dataframe. The areas of the polygons are included as node attributes (with key area). The shared perimeter of neighboring polygons are included as edge attributes (with key shared_perim). Nodes corresponding to polygons on the boundary of the union of all the geometries (e.g., the state, if your dataframe describes VTDs) have a boundary_node attribute (set to True) and a boundary_perim attribute with the length of this “exterior” boundary.

By default, areas and lengths are computed in a UTM projection suitable for the geometries. This prevents the bizarro area and perimeter values that show up when you accidentally do computations in Longitude-Latitude coordinates. If the user specifies reproject=False, then the areas and lengths will be computed in the GeoDataFrame’s current coordinate reference system. This option is for users who have a preferred CRS they would like to use.

Parameters:
  • dataframegeopandas.GeoDataFrame
  • adjacency – (optional) The adjacency type to use (“rook” or “queen”). Default is “rook”.
  • reproject – (optional) Whether to reproject to a UTM projection before creating the graph. Default is True.
  • ignore_errors – (optional) Whether to ignore all invalid geometries and attept to create the graph anyway. Default is False.
Returns:

The adjacency graph of the geometries from dataframe.

Return type:

Graph

classmethod from_json(json_file)[source]

Load a graph from a JSON file in the NetworkX json_graph format. :param json_file: Path to JSON file. :return: Graph

islands

The set of degree-0 nodes.

issue_warnings()[source]

Issue warnings if the graph has any red flags (right now, only islands).

join(dataframe, columns=None, left_index=None, right_index=None)[source]

Add data from a dataframe to the graph, matching nodes to rows when the node’s left_index attribute equals the row’s right_index value.

Parameters:dataframe – DataFrame.
Columns:(optional) The columns whose data you wish to add to the graph. If not provided, all columns are added.
Left_index:(optional) The node attribute used to match nodes to rows. If not provided, node IDs are used.
Right_index:(optional) The DataFrame column name to use to match rows to nodes. If not provided, the DataFrame’s index is used.
to_json(json_file, *, include_geometries_as_geojson=False)[source]

Save a graph to a JSON file in the NetworkX json_graph format. :param json_file: Path to target JSON file. :param bool include_geometry_as_geojson: (optional) Whether to include any

shapely geometry objects encountered in the graph’s node attributes as GeoJSON. The default (False) behavior is to remove all geometry objects because they are not serializable. Including the GeoJSON will result in a much larger JSON file.
warn_for_islands()[source]

Issue a warning if the graph has any islands (degree-0 nodes).

Partitions

class gerrychain.partition.Partition(graph=None, assignment=None, updaters=None, parent=None, flips=None)[source]

Bases: object

Partition represents a partition of the nodes of the graph. It will perform the first layer of computations at each step in the Markov chain - basic aggregations and calculations that we want to optimize.

Parameters:
  • graph – Underlying graph; a NetworkX object.
  • assignment – Dictionary assigning nodes to districts. If None, initialized to assign all nodes to district 0.
  • updaters – Dictionary of functions to track data about the partition. The keys are stored as attributes on the partition class, which the functions compute.
crosses_parts(edge)[source]

Answers the question “Does this edge cross from one part of the partition to another?

Parameters:edge – tuple of node IDs
Return type:bool
flip(flips)[source]

Returns the new partition obtained by performing the given flips on this partition.

Parameters:flips – dictionary assigning nodes of the graph to their new districts
Returns:the new Partition
Return type:Partition
classmethod from_districtr_file(graph, districtr_file, updaters=None)[source]

Create a Partition from a districting plan created with Districtr, a free and open-source web app created by MGGG for drawing districts.

The provided graph should be created from the same shapefile as the Districtr module used to draw the districting plan. These shapefiles may be found in a repository in the mggg-states GitHub organization, or by request from MGGG.

Parameters:
  • graphGraph
  • districtr_file – the path to the .json file exported from Districtr
  • updaters – dictionary of updaters
classmethod from_file(filename, assignment, updaters=None, columns=None)[source]

Create a Partition from an ESRI Shapefile, a GeoPackage, a GeoJSON file, or any other file that the fiona library can handle.

classmethod from_json(graph_path, assignment, updaters=None)[source]

Creates a Partition from a json file containing a serialized NetworkX adjacency_data object. Files of this kind for each state are available in the @gerrymandr/vtd-adjacency-graphs GitHub repository.

Parameters:
  • graph_path – String filename for the json file
  • assignment – String key for the node attribute giving a district assignment, or a dictionary mapping node IDs to district IDs.
  • updaters – (optional) Dictionary of updater functions to attach to the partition, in addition to the default_updaters of cls.
plot(geometries, **kwargs)[source]

Plot the partition, using the provided geometries.

Parameters:
to_json(json_path, *, save_assignment_as=None, include_geometries_as_geojson=False)[source]

Save the partition to a JSON file in the NetworkX json_graph format.

Parameters:
  • json_file – Path to target JSON file.
  • save_assignment_as (str) – (optional) The string to use as a node attribute key holding the current assignment. By default, does not save the assignment as an attribute.
  • include_geometries_as_geojson (bool) – (optional) Whether to include any shapely geometry objects encountered in the graph’s node attributes as GeoJSON. The default (False) behavior is to remove all geometry objects because they are not serializable. Including the GeoJSON will result in a much larger JSON file.
class gerrychain.partition.GeographicPartition(graph=None, assignment=None, updaters=None, parent=None, flips=None)[source]

Bases: gerrychain.partition.partition.Partition

A Partition with areas, perimeters, and boundary information included. These additional data allow you to compute compactness scores like Polsby-Popper.

Parameters:
  • graph – Underlying graph; a NetworkX object.
  • assignment – Dictionary assigning nodes to districts. If None, initialized to assign all nodes to district 0.
  • updaters – Dictionary of functions to track data about the partition. The keys are stored as attributes on the partition class, which the functions compute.

Markov chains

class gerrychain.MarkovChain(proposal, constraints, accept, initial_state, total_steps)[source]

MarkovChain is an iterator that allows the user to iterate over the states of a Markov chain run.

Example usage:

chain = MarkovChain(proposal, constraints, accept, initial_state, total_steps)
for state in chain:
    # Do whatever you want - print output, compute scores, ...
Parameters:
  • proposal – Function proposing the next state from the current state.
  • constraints – A function with signature Partition -> bool determining whether the proposed next state is valid (passes all binary constraints). Usually this is a Validator class instance.
  • accept – Function accepting or rejecting the proposed state. In the most basic use case, this always returns True. But if the user wanted to use a Metropolis-Hastings acceptance rule, this is where you would implement it.
  • initial_state – Initial gerrychain.partition.Partition class.
  • total_steps – Number of steps to run.

Binary constraints

The gerrychain.constraints module provides a collection of constraint functions and helper classes for the validation step in GerryChain.

Helper classes
Validator Collection of constraints
Bounds Bounds on numeric constraints
UpperBounds Upper bounds on numeric constraints
LowerBounds Lower bounds on numeric constraints
SelfConfiguringUpperBound Automatic upper bounds on numeric constraints
SelfConfiguringLowerBound Automatic lower bounds on numeric constraints
WithinPercentRangeOfBounds Percentage bounds for numeric constraints

Binary constraint functions
no_worse_L1_reciprocal_polsby_popper Lower bounded L1-reciprocal Polsby-Popper
no_worse_L_minus_1_reciprocal_polsby_popper Lower bounded L(-1)-reciprocal Polsby-Popper
contiguous Districts are contiguous (with NetworkX methods)
contiguous_bf Districts are contiguous (with a breadth-first search)
single_flip_contiguous Districts are contiguous (optimized for propose_random_flip proposal)
no_vanishing_districts No districts may be completely consumed

Each new step proposed to the chain is passed off to the “validator” functions here to determine whether or not the step is valid. If it is invalid (breaks contiguity, for instance), then the step is immediately rejected.

A validator should take in a Partition instance, and should return whether or not the instance is valid according to their rules. Many top-level functions following this signature in this module are examples of this.

class gerrychain.constraints.Validator(constraints)[source]

A single callable for checking that a partition passes a collection of constraints. Intended to be passed as the is_valid parameter when instantiating MarkovChain.

This class is meant to be called as a function after instantiation; its return is True if all validators pass, and False if any one fails.

Example usage:

is_valid = Validator([constraint1, constraint2, constraint3])
chain = MarkovChain(proposal, is_valid, accept, initial_state, total_steps)
Parameters:constraints – List of validator functions that will check partitions.

Updaters

gerrychain.updaters.county_splits(partition_name, county_field_name)[source]

Track county splits.

Parameters:
  • partition_name – Name that the Partition instance will store.
  • county_field_name – Name of county ID field on the graph.
Returns:

The tracked data is a dictionary keyed on the county ID. The stored values are tuples of the form (split, nodes, seen). split is a CountySplit enum, nodes is a list of node IDs, and seen is a list of assignment IDs that are contained in the county.

class gerrychain.updaters.Tally(fields, alias=None, dtype=<class 'int'>)[source]

An updater for keeping a tally of one or more node attributes.

Parameters:
  • fields – the list of node attributes that you want to tally. Or a just a single attribute name as a string.
  • alias – the aliased name of this Tally (meaning, the key corresponding to this Tally in the Partition’s updaters dictionary)
  • dtype – the type (int, float, etc.) that you want the tally to have
class gerrychain.updaters.DataTally(data, alias)[source]

An updater for tallying numerical data that is not necessarily stored as node attributes

Parameters:
  • data – a dict or Series indexed by the graph’s nodes, or the string key for a node attribute containing the Tally’s data.
  • alias – the name of the tally in the Partition’s updaters dictionary
class gerrychain.updaters.CountySplit[source]

An enumeration.

class gerrychain.updaters.Election(name, parties_to_columns, alias=None)[source]

Represents the data of one election, with races conducted in each part of the partition.

As we vary the districting plan, we can use the same node-level vote totals to tabulate hypothetical elections. To do this manually with tallies, we would have to maintain tallies for each party, as well as the total number of votes, and then compute the electoral results and percentages from scratch every time. To make this simpler, this class provides an ElectionUpdater to manage these tallies. The updater returns an ElectionResults class giving a convenient view of the election results, with methods like wins() or percent() for common queries the user might make on election results.

Example usage:

# Assuming your nodes have attributes "2008_D", "2008_R"
# with (for example) 2008 senate election vote totals
election = Election(
    "2008 Senate",
    {"Democratic": "2008_D", "Republican": "2008_R"},
    alias="2008_Sen"
)

# Assuming you already have a graph and assignment:
partition = Partition(
    graph,
    assignment,
    updaters={"2008_Sen": election}
)

# The updater returns an ElectionResults instance, which
# we can use (for example) to see how many seats a given
# party would win in this partition using this election's
# vote distribution:
partition["2008_Sen"].wins("Republican")
Parameters:
  • name – The name of the election. (e.g. “2008 Presidential”)
  • parties_to_columns – A dictionary matching party names to their data columns, either as actual columns (list-like, indexed by nodes) or as string keys for the node attributes that hold the party’s vote totals. Or, a list of strings which will serve as both the party names and the node attribute keys.
  • alias – (optional) Alias that the election is registered under in the Partition’s dictionary of updaters.

Elections

class gerrychain.updaters.election.Election(name, parties_to_columns, alias=None)[source]

Represents the data of one election, with races conducted in each part of the partition.

As we vary the districting plan, we can use the same node-level vote totals to tabulate hypothetical elections. To do this manually with tallies, we would have to maintain tallies for each party, as well as the total number of votes, and then compute the electoral results and percentages from scratch every time. To make this simpler, this class provides an ElectionUpdater to manage these tallies. The updater returns an ElectionResults class giving a convenient view of the election results, with methods like wins() or percent() for common queries the user might make on election results.

Example usage:

# Assuming your nodes have attributes "2008_D", "2008_R"
# with (for example) 2008 senate election vote totals
election = Election(
    "2008 Senate",
    {"Democratic": "2008_D", "Republican": "2008_R"},
    alias="2008_Sen"
)

# Assuming you already have a graph and assignment:
partition = Partition(
    graph,
    assignment,
    updaters={"2008_Sen": election}
)

# The updater returns an ElectionResults instance, which
# we can use (for example) to see how many seats a given
# party would win in this partition using this election's
# vote distribution:
partition["2008_Sen"].wins("Republican")
Parameters:
  • name – The name of the election. (e.g. “2008 Presidential”)
  • parties_to_columns – A dictionary matching party names to their data columns, either as actual columns (list-like, indexed by nodes) or as string keys for the node attributes that hold the party’s vote totals. Or, a list of strings which will serve as both the party names and the node attribute keys.
  • alias – (optional) Alias that the election is registered under in the Partition’s dictionary of updaters.
class gerrychain.updaters.election.ElectionResults(election, counts, races)[source]

Represents the results of an election. Provides helpful methods to answer common questions you might have about an election (Who won? How many seats?, etc.).

count(party, race=None)[source]

Returns the total number of votes that party received in a given race (part of the partition). If race is omitted, returns the overall vote total of party.

Parameters:
  • party – Party ID.
  • race – ID of the part of the partition whose votes we want to tally.
counts(party)[source]
Parameters:party – Party ID
Returns:tuple of the total votes cast for party in each part of the partition
percent(party, race=None)[source]

Returns the percentage of the vote that party received in a given race (part of the partition). If race is omitted, returns the overall vote share of party.

Parameters:
  • party – Party ID.
  • race – ID of the part of the partition whose votes we want to tally.
percents(party)[source]
Parameters:party – The party
Returns:The tuple of the percentage of votes that party received in each part of the partition
seats(party)[source]

Returns the number of races that party won.

votes(party)[source]

An alias for counts().

wins(party)[source]

An alias for seats().

won(party, race)[source]

Answers “Did party win the race in part race?” with True or False.

class gerrychain.updaters.election.ElectionUpdater(election)[source]

The updater for computing the election results in each part of the partition after each step in the Markov chain. The actual results are returned to the user as an ElectionResults instance.

Grids

To make it easier to play around with GerryChain, we have provided a Grid class representing a partition of a grid graph. This is especially useful if you want to start experimenting but do not yet have a clean set of data and geometries to build your graph from.

class gerrychain.grid.Grid(dimensions=None, with_diagonals=False, assignment=None, updaters=None, parent=None, flips=None)[source]

The Grid class represents a grid partitioned into districts. It is useful for running little experiments with GerryChain without needing to do any data processing or cleaning to get started.

Example usage:

grid = Grid((10,10))

The nodes of grid.graph are labelled by tuples (i,j), for 0 <= i <= 10 and 0 <= j <= 10. Each node has an area of 1 and each edge has shared_perim 1.

Parameters:
  • dimensions – tuple (m,n) of the desired dimensions of the grid.
  • with_diagonals – (optional, defaults to False) whether to include diagonals as edges of the graph (i.e., whether to use ‘queen’ adjacency rather than ‘rook’ adjacency).
  • assignment – (optional) dict matching nodes to their districts. If not provided, partitions the grid into 4 quarters of roughly equal size.
  • updaters – (optional) dict matching names of attributes of the Partition to functions that compute their values. If not provided, the Grid configures the cut_edges updater for convenience.
as_list_of_lists()[source]

Returns the grid as a list of lists (like a matrix), where the (i,j)th entry is the assigned district of the node in position (i,j) on the grid.

Spanning tree methods

The recom() proposal function operates on spanning trees of the adjacency graph in order to generate new contiguous districting plans with balanced population.

The gerrychain.tree submodule exposes some helpful functions for partitioning graphs using spanning trees methods. These may be used to implement proposal functions or to generate initial plans for running Markov chains, as described in MGGG’s 2018 Virginia House of Delegates report.

gerrychain.tree.bipartition_tree(graph, pop_col, pop_target, epsilon, node_repeats, spanning_tree=None, choice=<bound method Random.choice of <random.Random object>>)[source]

This function finds a balanced 2 partition of a graph by drawing a spanning tree and finding an edge to cut that leaves at most an epsilon imbalance between the populations of the parts. If a root fails, new roots are tried until node_repeats in which case a new tree is drawn.

Builds up a connected subgraph with a connected complement whose population is epsilon * pop_target away from pop_target.

Returns a subset of nodes of graph (whose induced subgraph is connected). The other part of the partition is the complement of this subset.

Parameters:
  • graph – The graph to partition
  • pop_col – The node attribute holding the population of each node
  • pop_target – The target population for the returned subset of nodes
  • epsilon – The allowable deviation from pop_target (as a percentage of pop_target) for the subgraph’s population
  • node_repeats – A parameter for the algorithm: how many different choices of root to use before drawing a new spanning tree.
  • spanning_tree – The spanning tree for the algorithm to use (used when the algorithm chooses a new root and for testing)
  • choicerandom.choice(). Can be substituted for testing.
gerrychain.tree.recursive_tree_part(graph, parts, pop_target, pop_col, epsilon, node_repeats=None)[source]

Uses bipartition_tree() recursively to partition a tree into len(parts) parts of population pop_target (within epsilon). Can be used to generate initial seed plans or to implement ReCom-like “merge walk” proposals.

Parameters:
  • graph – The graph
  • parts – Iterable of part labels (like [0,1,2] or range(4)
  • pop_target – Target population for each part of the partition
  • pop_col – Node attribute key holding population data
  • epsilon – How far (as a percentage of pop_target) from pop_target the parts of the partition can be
  • node_repeats – Parameter for bipartition_tree() to use.
Returns:

New assignments for the nodes of graph.

Return type:

dict