Worst Case Attack

A membership inference attack (MIA) estimates the likelihood of specific data

points were included to train the ML model. If high, then the model is at risk of data breach. If low, it is unlikely.

The Worst-Case is a white box MIA scenario, and it does not need any shadow model. It is

described in Rezaei as the easiest possible for the attacker. To perform a MIA, a new binary set of data (member, non-member of target train data) was created containing the predicted probabilities of the training and test data by the target model. A Random Forest Classifier was fitted with half of this new set.

This scenario was not supposed to simulate a realistic attack (if the attacker has access

to the data, they do not need to attack) but instead to assess whether there were potential vulnerabilities in the model that could potentially be leveraged by an attacker. This can give a good estimation of the maximum capability of an attacker to succeed.

Data breaches of sensitive and personal data must be avoided.

In some cases, the risk of data leakage could be overestimated, but it does guarantee (as much as possible) that any ML model allowed out of a TRE or the host environment is safe. At the same time, it’s easy to implement (see figure below).

This attack is, however, evaluated using average-case “accuracy” metrics that fail to characterize

whether the attack can confidently identify any members of the training set. For instance, imagine the dataset includes 2 people with a rare disease, which are all the cases in a given region. In such case these two cases could be at higher risk of being identified even if the overall estimate risk of disclosure is low. For a more comprehensive checks a likelihood scenario attacks/likelihood is recommended.

Run a worst case attack based upon predictive probabilities.

class sacroml.attacks.worst_case_attack.WorstCaseAttack(output_dir: str = 'outputs', write_report: bool = True, n_reps: int = 10, reproduce_split: int | Iterable[int] | None = 5, p_thresh: float = 0.05, n_dummy_reps: int = 1, train_beta: int = 1, test_beta: int = 1, test_prop: float = 0.2, include_model_correct_feature: bool = False, sort_probs: bool = True, attack_model: str = 'sklearn.ensemble.RandomForestClassifier', attack_model_params: dict | None = None)[source]

Worst case attack.

Methods

attack(target)

Check whether an attack can be performed and run the attack.

attack_from_preds(proba_train, proba_test[, ...])

Run attack based upon the predictions in proba_train and proba_test.

attackable(target)

Return whether a target can be assessed with WorstCaseAttack.

generate_arrays(n_rows_in, n_rows_out[, ...])

Generate train and test prediction arrays, used when computing baseline.

get_params()

Get parameters for this attack.

run_attack_reps(proba_train, proba_test[, ...])

Run actual attack reps from train and test predictions.

classmethod attackable(target: Target) bool[source]

Return whether a target can be assessed with WorstCaseAttack.

__init__(output_dir: str = 'outputs', write_report: bool = True, n_reps: int = 10, reproduce_split: int | Iterable[int] | None = 5, p_thresh: float = 0.05, n_dummy_reps: int = 1, train_beta: int = 1, test_beta: int = 1, test_prop: float = 0.2, include_model_correct_feature: bool = False, sort_probs: bool = True, attack_model: str = 'sklearn.ensemble.RandomForestClassifier', attack_model_params: dict | None = None) None[source]

Construct an object to execute a worst case attack.

Parameters:
output_dirstr

Name of the directory where outputs are stored.

write_reportbool

Whether to generate a JSON and PDF report.

n_repsint

Number of attacks to run – in each iteration an attack model is trained on a different subset of the data.

reproduce_splitint or Iterable[int] or None

Variable that controls the reproducibility of the data split. It can be an integer or a list of integers of length n_reps. Default : 5.

p_threshfloat

Threshold to determine significance of things. For instance auc_p_value and pdif_vals.

n_dummy_repsint

Number of baseline (dummy) experiments to do.

train_betaint

Value of b for beta distribution used to sample the in-sample (training) probabilities.

test_betaint

Value of b for beta distribution used to sample the out-of-sample (test) probabilities.

test_propfloat

Proportion of data to use as a test set for the attack model.

include_model_correct_featurebool

Inclusion of additional feature to hold whether or not the target model made a correct prediction for each example.

sort_probsbool

Whether to sort combined preds (from training and test) to have highest probabilities in the first column.

attack_modelstr

Class name of the attack model.

attack_model_paramsdict or None

Dictionary of hyperparameters for the attack_model such as min_sample_split, min_samples_leaf, etc.

attack(target: Target) dict

Check whether an attack can be performed and run the attack.

attack_from_preds(proba_train: ndarray, proba_test: ndarray, train_correct: ndarray | None = None, test_correct: ndarray | None = None) None[source]

Run attack based upon the predictions in proba_train and proba_test.

Parameters:
proba_trainnp.ndarray

Array of train predictions. One row per example, one column per class.

proba_testnp.ndarray

Array of test predictions. One row per example, one column per class.

generate_arrays(n_rows_in: int, n_rows_out: int, train_beta: float = 2, test_beta: float = 2) tuple[ndarray, ndarray][source]

Generate train and test prediction arrays, used when computing baseline.

Parameters:
n_rows_inint

Number of rows of in-sample (training) probabilities.

n_rows_outint

Number of rows of out-of-sample (testing) probabilities.

train_betafloat

Beta value for generating train probabilities.

test_betafloat:

Beta value for generating test probabilities.

Returns:
proba_trainnp.ndarray

Array of train predictions (n_rows x 2 columns).

proba_testnp.ndarray

Array of test predictions (n_rows x 2 columns).

get_params() dict

Get parameters for this attack.

Returns:
paramsdict

Parameter names mapped to their values.

run_attack_reps(proba_train: ndarray, proba_test: ndarray, train_correct: ndarray = None, test_correct: ndarray = None) dict[source]

Run actual attack reps from train and test predictions.

Parameters:
proba_trainnp.ndarray

Predictions from the model on training (in-sample) data.

proba_testnp.ndarray

Predictions from the model on testing (out-of-sample) data.

Returns:
dict

Dictionary of mia_metrics (a list of metric across repetitions).