• Author(s): Drew Prinster, Samuel Stanton, Anqi Liu, Suchi Saria

The growing adoption of machine learning (ML) has led to a pressing need for practitioners to quantify and control the risks associated with these systems. This challenge is particularly significant when ML systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution.

Conformal prediction has emerged as a promising approach to uncertainty and risk quantification. However, existing variants either fail to accommodate sequences of data-dependent shifts or do not fully exploit the fact that agent-induced shift is under control. This work demonstrates that conformal prediction can theoretically be extended to any joint data distribution, although it is impractical to compute in the most general case.

For practical applications, a procedure is outlined for deriving specific conformal algorithms for any data distribution. This procedure is used to derive tractable algorithms for a series of agent-induced covariate shifts. The proposed algorithms are evaluated empirically on synthetic black-box optimization and active learning tasks, providing a framework for risk quantification in ML systems with autonomous data collection capabilities.