JSON Output for MIA attacks

We standaridised the JSON output both for worst_case and LIRA attacks where possible. A generic JSON output structure is presented as under:

General Structure

Key components of JSON output across attacks will be:

log_id: Log identifier - a random unique id for each entry
log_time: the time when the log was created
metadata: standardised variables related to a specific attack type
attack_experiment_logger: Attack experiment logger - maintains instances of metrics computed across iterations

Worst-Case Attack

A worst case attack will have the following components in a metadata component of JSON output.


experiment_details: this will have attack type parameters
    n_reps: number of attacks to run -- in each iteration an attack model is trained on a different subset of the data
    p_thresh: threshold to determine significance of things. For instance auc_p_value and pdif_vals
    n_dummy_reps: number of baseline (dummy) experiments to do
    train_beta: value of b for beta distribution used to sample the in-sample (training) probabilities
    test_beta: value of b for beta distribution used to sample the out-of-sample (test) probabilities
    test_prop: proportion of data to use as a test set for the attack model
    n_rows_in: number of rows for in-sample (training data)
    n_rows_out: number of rows for out-of-sample (test data)
    training_preds_filename: name of the file to keep predictions of the training data (in-sample)
    test_preds_filename: name of the file to keep predictions of the test data (out-of-sample)
    report_name: name of the JSON report
    include_model_correct_feature: inclusion of additional feature to hold whether or not the target model made a correct prediction for each example
    sort_probs: true in case require to sort combine preds (from training and test) to have highest probabilities in the first column
    mia_attack_model: name of the attack model suchas RandomForestClassifier
    mia_attack_model_hyp: list of hyper parameters for the mia_attack_model such as min_sample_split, min_samples_leaf, max_depth etc
    attack_metric_success_name: the name of metric to compute for the attack being successful
    attack_metric_success_thresh: threshold for a given metric to measure attack being successful or not
    attack_metric_success_comp_type: threshold comparison operator (i.e., gte: greater than or equal to, gt: greater than, lte: less than or equal to, lt: less than, eq: equal to and not_eq: not equal to)
    attack_metric_success_count_thresh: a counter to record how many times an attack was successful given that the threshold has fulfilled criteria for a given comparison type
    attack_fail_fast: If true it stops repetitions earlier based on the given attack metric (i.e., attack_metric_success_name) considering the comparison type (attack_metric_success_comp_type) satisfying a threshold (i.e., attack_metric_success_thresh) for n (attack_metric_success_count_thresh) number of times

attack: name of the attack type ('WorstCase attack')

global_metric: the following global metrics are computed for attack repetitions
    null_auc_3sd_range: a three standard deviation range from the mean for the observed p_value
    n_sig_auc_p_vals: number of significant p values given a p_thresh value
    n_sig_auc_p_vals_corrected: number of significant p values given a p_thresh value given applying testing corrections
    n_sig_pdif_vals: number of significant pdif given a p_thresh value
    n_sig_pdif_vals_corrected: number of significant p values given a p_thresh value given applying testing corrections

baseline_global_metric: the following global metrics are computed for attack repetitions across all experiments of baseline (dummy) experiments
    null_auc_3sd_range: a three standard deviation range from the mean for the observed p_value
    n_sig_auc_p_vals: number of significant p values given a p_thresh value
    n_sig_auc_p_vals_corrected: number of significant p values given a p_thresh value given applying testing corrections
    n_sig_pdif_vals: number of significant pdif given a p_thresh value
    n_sig_pdif_vals_corrected: number of significant p values given a p_thresh value given applying testing corrections

A worst case attack will have experiment logger and baseline (dummy) experiments logger which is unique to worst case attack only.


attack_instance_logger: stores metrics computed across all iteration of attacks (i.e. n_reps)
        TPR: value of true positive rate
        FPR: value of false positive rate

        ... all metric values computed similar to instance_0

        ... n will be n_reps-1 representing iterations of attacks
    succcess_count: number of attacks being successful given the attack success criteria demonstrated in metadata
    fail_count: number of attacks being not successful


   attack_instance_logger: stores metrics computed across all iteration of attacks (i.e. n_reps)
            TPR: value of true positive rate
            FPR: value of false positive rate

            ... all metric values computed similar to instance_0

        instance_n: n will be n_reps-1 representing iterations of attacks
        succcess_count: number of attacks being successful given the attack success criteria demonstrated in metadata
        fail_count: number of attacks being not successful
dummy_attack_metrics_experiment_n: n will be n_dummy_reps-1 representing iterations of attacks

Example JSON output for worst case attack is accessible from link

LIRA Attack

A LIRA attack will have the following components in a metadata component of JSON output.


experiment_details: this will have attack type parameters
    n_shadow_models: number of shadow models to be trained
    p_thresh: threshold to determine significance of things. For instance auc_p_value and pdif_vals
    report_name: name of the JSON report
    training_data_filename: name of the data file for the training data (in-sample)
    test_data_filename: name of the file for the test data (out-of-sample)
    training_preds_filename: name of the file to keep predictions of the training data (in-sample)
    test_preds_filename: name of the file to keep predictions of the test data (out-of-sample)
    target_model: name of the attack model suchas RandomForestClassifier
    target_model_hyp: list of hyper parameters for the mia_attack_model such as min_sample_split, min_samples_leaf etc
    n_shadow_rows_confidences_min: number of minimum number of confidences calculated for each row in test data (out-of-sample)
    attack_fail_fast: If true it stops repetitions earlier based on the given minimum number of confidences for each row in the test data

attack: name of the attack type ('WorstCase attack')

global_metric: the following global metrics are computed for attack repetitions
    null_auc_3sd_range: a three standard deviation range from the mean for the observed p_value
    AUC_sig: significant AUC at given p value
    PDIF_sig: significant PDIF at given p value

A LIRA attack will have experiment logger with only one instance.


attack_instance_logger: stores metrics computed across all iteration of attacks (i.e. n_reps)
    instance_0: For a lira attack type, this will have a single instance
        TPR: value of true positive rate
        FPR: value of false positive rate
        n_shadow_models_trained: this represent number of actual models trained. For a case where attack_fail_fast is true and minimum number of confidences computed for each row in the test data, there is likely to be a chance to have less number of shadow models trained satisfying the given criteria

Example JSON output for LIRA attack is accessible from link

Running MIA Attacks from Config File

Both for worst case and LIRA attacks, examples presented worst_case_attack_example and lira_attack_example in the AI-SDC explains most of the possible use of configuration files.