Worst Case Attack

Run a worst case attack based upon predictive probabilities.

class sacroml.attacks.worst_case_attack.WorstCaseAttack(output_dir: str = 'outputs', write_report: bool = True, n_reps: int = 10, reproduce_split: int | Iterable[int] | None = 5, p_thresh: float = 0.05, n_dummy_reps: int = 1, train_beta: int = 1, test_beta: int = 1, test_prop: float = 0.2, include_model_correct_feature: bool = False, sort_probs: bool = True, attack_model: str = 'sklearn.ensemble.RandomForestClassifier', attack_model_params: dict | None = None)[source]

Worst case attack.

Methods

attack(target)

Run worst case attack.

attack_from_preds(proba_train, proba_test[, ...])

Run attack based upon the predictions in proba_train and proba_test.

generate_arrays(n_rows_in, n_rows_out[, ...])

Generate train and test prediction arrays, used when computing baseline.

get_params()

Get parameters for this attack.

run_attack_reps(proba_train, proba_test[, ...])

Run actual attack reps from train and test predictions.

__init__(output_dir: str = 'outputs', write_report: bool = True, n_reps: int = 10, reproduce_split: int | Iterable[int] | None = 5, p_thresh: float = 0.05, n_dummy_reps: int = 1, train_beta: int = 1, test_beta: int = 1, test_prop: float = 0.2, include_model_correct_feature: bool = False, sort_probs: bool = True, attack_model: str = 'sklearn.ensemble.RandomForestClassifier', attack_model_params: dict | None = None) None[source]

Construct an object to execute a worst case attack.

Parameters:
output_dirstr

Name of the directory where outputs are stored.

write_reportbool

Whether to generate a JSON and PDF report.

n_repsint

Number of attacks to run – in each iteration an attack model is trained on a different subset of the data.

reproduce_splitint or Iterable[int] or None

Variable that controls the reproducibility of the data split. It can be an integer or a list of integers of length n_reps. Default : 5.

p_threshfloat

Threshold to determine significance of things. For instance auc_p_value and pdif_vals.

n_dummy_repsint

Number of baseline (dummy) experiments to do.

train_betaint

Value of b for beta distribution used to sample the in-sample (training) probabilities.

test_betaint

Value of b for beta distribution used to sample the out-of-sample (test) probabilities.

test_propfloat

Proportion of data to use as a test set for the attack model.

include_model_correct_featurebool

Inclusion of additional feature to hold whether or not the target model made a correct prediction for each example.

sort_probsbool

Whether to sort combined preds (from training and test) to have highest probabilities in the first column.

attack_modelstr

Class name of the attack model.

attack_model_paramsdict or None

Dictionary of hyperparameters for the attack_model such as min_sample_split, min_samples_leaf, etc.

attack(target: Target) dict[source]

Run worst case attack.

Parameters:
targetattacks.target.Target

target as a Target class object

Returns:
dict

Attack report.

attack_from_preds(proba_train: ndarray, proba_test: ndarray, train_correct: ndarray | None = None, test_correct: ndarray | None = None) None[source]

Run attack based upon the predictions in proba_train and proba_test.

Parameters:
proba_trainnp.ndarray

Array of train predictions. One row per example, one column per class.

proba_testnp.ndarray

Array of test predictions. One row per example, one column per class.

generate_arrays(n_rows_in: int, n_rows_out: int, train_beta: float = 2, test_beta: float = 2) tuple[ndarray, ndarray][source]

Generate train and test prediction arrays, used when computing baseline.

Parameters:
n_rows_inint

Number of rows of in-sample (training) probabilities.

n_rows_outint

Number of rows of out-of-sample (testing) probabilities.

train_betafloat

Beta value for generating train probabilities.

test_betafloat:

Beta value for generating test probabilities.

Returns:
proba_trainnp.ndarray

Array of train predictions (n_rows x 2 columns).

proba_testnp.ndarray

Array of test predictions (n_rows x 2 columns).

get_params() dict

Get parameters for this attack.

Returns:
paramsdict

Parameter names mapped to their values.

run_attack_reps(proba_train: ndarray, proba_test: ndarray, train_correct: ndarray | None = None, test_correct: ndarray | None = None) dict[source]

Run actual attack reps from train and test predictions.

Parameters:
proba_trainnp.ndarray

Predictions from the model on training (in-sample) data.

proba_testnp.ndarray

Predictions from the model on testing (out-of-sample) data.

Returns:
dict

Dictionary of mia_metrics (a list of metric across repetitions).