Dataset

Workers

Employer-employee matched data with anonymized firm and worker identifiers plus job-title and year fields.

Bipartite incidence pattern for Workers

Graph Summary

A large, realistic two-way firm-worker benchmark.

The primary benchmark graph is built from unique (id1, id2) pairs. The figure shows a binned sparsity pattern of the bipartite incidence block, with both partitions relabeled to contiguous integer identifiers. This block is the off-diagonal part of the corresponding graph Laplacian.

504,315rows
253,929unique edges
28,864id1 levels
218,390id2 levels
19,994components
247,254nodes

Variables

Columns in the clean CSV:

id1 id2 id3 id4 x1 x2 y

The v1 graph uses id1 and id2. Additional identifier-like columns available for richer specifications: id3, id4.

Source Notes

Anonymized employer-employee matched panel collected by the author.

Regressors and outcome are synthetic benchmark variables.

Historical Benchmark

2017 SEC benchmark timings for this dataset, in seconds:

MethodCitationSeconds
MAP-Aitken (Guimaraes 2012) 484.1
MAP-SD (Gaure 2013) 146.0
MAP-CG-Sym (Correia 2016) 169.6
MAP+Prune (Correia 2016) 646.1
LSMR (Gomez 2016) 356.3