JSON Output for MIA attacks
We standaridised the JSON output both for worst_case and LIRA attacks where possible. A generic JSON output structure is presented as under:
General Structure
Key components of JSON output across attacks will be:
log_id: Log identifier - a random unique id for each entry
log_time: the time when the log was created
metadata: standardised variables related to a specific attack type
attack_experiment_logger: Attack experiment logger - maintains instances of metrics computed across iterations
Worst-Case Attack
A worst case attack will have the following components in a metadata component of JSON output.
metadata:
experiment_details: this will have attack type parameters
n_reps: number of attacks to run -- in each iteration an attack model is trained on a different subset of the data
p_thresh: threshold to determine significance of things. For instance auc_p_value and pdif_vals
n_dummy_reps: number of baseline (dummy) experiments to do
train_beta: value of b for beta distribution used to sample the in-sample (training) probabilities
test_beta: value of b for beta distribution used to sample the out-of-sample (test) probabilities
test_prop: proportion of data to use as a test set for the attack model
n_rows_in: number of rows for in-sample (training data)
n_rows_out: number of rows for out-of-sample (test data)
training_preds_filename: name of the file to keep predictions of the training data (in-sample)
test_preds_filename: name of the file to keep predictions of the test data (out-of-sample)
report_name: name of the JSON report
include_model_correct_feature: inclusion of additional feature to hold whether or not the target model made a correct prediction for each example
sort_probs: true in case require to sort combine preds (from training and test) to have highest probabilities in the first column
mia_attack_model: name of the attack model suchas RandomForestClassifier
mia_attack_model_hyp: list of hyper parameters for the mia_attack_model such as min_sample_split, min_samples_leaf, max_depth etc
attack_metric_success_name: the name of metric to compute for the attack being successful
attack_metric_success_thresh: threshold for a given metric to measure attack being successful or not
attack_metric_success_comp_type: threshold comparison operator (i.e., gte: greater than or equal to, gt: greater than, lte: less than or equal to, lt: less than, eq: equal to and not_eq: not equal to)
attack_metric_success_count_thresh: a counter to record how many times an attack was successful given that the threshold has fulfilled criteria for a given comparison type
attack_fail_fast: If true it stops repetitions earlier based on the given attack metric (i.e., attack_metric_success_name) considering the comparison type (attack_metric_success_comp_type) satisfying a threshold (i.e., attack_metric_success_thresh) for n (attack_metric_success_count_thresh) number of times
attack: name of the attack type ('WorstCase attack')
global_metric: the following global metrics are computed for attack repetitions
null_auc_3sd_range: a three standard deviation range from the mean for the observed p_value
n_sig_auc_p_vals: number of significant p values given a p_thresh value
n_sig_auc_p_vals_corrected: number of significant p values given a p_thresh value given applying testing corrections
n_sig_pdif_vals: number of significant pdif given a p_thresh value
n_sig_pdif_vals_corrected: number of significant p values given a p_thresh value given applying testing corrections
baseline_global_metric: the following global metrics are computed for attack repetitions across all experiments of baseline (dummy) experiments
null_auc_3sd_range: a three standard deviation range from the mean for the observed p_value
n_sig_auc_p_vals: number of significant p values given a p_thresh value
n_sig_auc_p_vals_corrected: number of significant p values given a p_thresh value given applying testing corrections
n_sig_pdif_vals: number of significant pdif given a p_thresh value
n_sig_pdif_vals_corrected: number of significant p values given a p_thresh value given applying testing corrections
A worst case attack will have experiment logger and baseline (dummy) experiments logger which is unique to worst case attack only.
attack_experiment_logger:
attack_instance_logger: stores metrics computed across all iteration of attacks (i.e. n_reps)
instance_0:
TPR: value of true positive rate
FPR: value of false positive rate
...
...
n_pos_test_examples:
n_neg_test_examples:
instance_1:
... all metric values computed similar to instance_0
instance_n:
... n will be n_reps-1 representing iterations of attacks
attack_metric_failfast_summary:
succcess_count: number of attacks being successful given the attack success criteria demonstrated in metadata
fail_count: number of attacks being not successful
dummy_attack_experiments_logger:
dummy_attack_metrics_experiment_0:
attack_instance_logger: stores metrics computed across all iteration of attacks (i.e. n_reps)
instance_0:
TPR: value of true positive rate
FPR: value of false positive rate
...
...
n_pos_test_examples:
n_neg_test_examples:
instance_1:
... all metric values computed similar to instance_0
instance_n: n will be n_reps-1 representing iterations of attacks
...
attack_metric_failfast_summary:
succcess_count: number of attacks being successful given the attack success criteria demonstrated in metadata
fail_count: number of attacks being not successful
dummy_attack_metrics_experiment_1:
...
...
dummy_attack_metrics_experiment_n: n will be n_dummy_reps-1 representing iterations of attacks
...
Example JSON output for worst case attack is accessible from link
LIRA Attack
A LIRA attack will have the following components in a metadata component of JSON output.
metadata:
experiment_details: this will have attack type parameters
n_shadow_models: number of shadow models to be trained
p_thresh: threshold to determine significance of things. For instance auc_p_value and pdif_vals
report_name: name of the JSON report
training_data_filename: name of the data file for the training data (in-sample)
test_data_filename: name of the file for the test data (out-of-sample)
training_preds_filename: name of the file to keep predictions of the training data (in-sample)
test_preds_filename: name of the file to keep predictions of the test data (out-of-sample)
target_model: name of the attack model suchas RandomForestClassifier
target_model_hyp: list of hyper parameters for the mia_attack_model such as min_sample_split, min_samples_leaf etc
n_shadow_rows_confidences_min: number of minimum number of confidences calculated for each row in test data (out-of-sample)
attack_fail_fast: If true it stops repetitions earlier based on the given minimum number of confidences for each row in the test data
attack: name of the attack type ('WorstCase attack')
global_metric: the following global metrics are computed for attack repetitions
null_auc_3sd_range: a three standard deviation range from the mean for the observed p_value
AUC_sig: significant AUC at given p value
PDIF_sig: significant PDIF at given p value
A LIRA attack will have experiment logger with only one instance.
attack_experiment_logger:
attack_instance_logger: stores metrics computed across all iteration of attacks (i.e. n_reps)
instance_0: For a lira attack type, this will have a single instance
TPR: value of true positive rate
FPR: value of false positive rate
...
...
n_pos_test_examples:
n_neg_test_examples:
n_shadow_models_trained: this represent number of actual models trained. For a case where attack_fail_fast is true and minimum number of confidences computed for each row in the test data, there is likely to be a chance to have less number of shadow models trained satisfying the given criteria
Example JSON output for LIRA attack is accessible from link
Running MIA Attacks from Config File
Both for worst case and LIRA attacks, examples presented worst_case_attack_example and lira_attack_example in the AI-SDC explains most of the possible use of configuration files.