Grace Guo, Lifu Deng, et al.
FAccT 2024
The US Decennial Census provides valuable data for researchers across a wild range of disciplines. In 2020, the Census Bureau changed its disclosure avoidance system to use differential privacy. However, research on the impacts of this change on both respondent privacy and data accuracy are limited because privacy-preserving mechanisms operate on the underlying data ("microdata") as opposed to the published aggregate statistics: Because the microdata are kept secret, outside researchers cannot directly run the algorithm.
In this work, we provide tools to sample synthetic microdata from a distribution given published Census statistics. We formulate synthetic data generation in this context as a combinatorial optimization problem and develop new algorithms that improve upon existing techniques. While the problem we study is provably hard, we show empirically that our methods work well in practice, and we offer theoretical arguments to explain our performance.
Grace Guo, Lifu Deng, et al.
FAccT 2024
Natalia Martinez Gil, Kanthi Sarpatwar, et al.
NeurIPS 2023
Binchi Zhang, Yushun Dong, et al.
ICLR 2024
Kristjan Greenewald, Yuancheng Yu, et al.
NeurIPS 2024