JSON Output for Attacks
JSON output has been standardised where possible. A generic JSON output structure is presented as under:
General Structure
Key components of JSON output across attacks will be:
log_id: Log identifier - a random unique id for each entry
log_time: the time when the log was created
metadata: standardised variables related to a specific attack type
attack_experiment_logger: Attack experiment logger - maintains instances of metrics computed across iterations
Worst-Case Attack
A worst case attack will have the following components in a metadata component of JSON output.
metadata:
attack_name: Name of the attack
attack_params: Attack parameters
target_model: Name of the target model
target_model_params: Target model parameters
global_metrics: The following global metrics are computed for attack repetitions
null_auc_3sd_range: A three standard deviation range from the mean for the observed p_value
n_sig_auc_p_vals: Number of significant p values given a p_thresh value
n_sig_auc_p_vals_corrected: Number of significant p values given a p_thresh value given applying testing corrections
n_sig_pdif_vals: Number of significant pdif given a p_thresh value
n_sig_pdif_vals_corrected: Number of significant p values given a p_thresh value given applying testing corrections
baseline_global_metric: The following global metrics are computed for attack repetitions across all experiments of baseline (dummy) experiments
null_auc_3sd_range: A three standard deviation range from the mean for the observed p_value
n_sig_auc_p_vals: Number of significant p values given a p_thresh value
n_sig_auc_p_vals_corrected: Number of significant p values given a p_thresh value given applying testing corrections
n_sig_pdif_vals: Number of significant pdif given a p_thresh value
n_sig_pdif_vals_corrected: Number of significant p values given a p_thresh value given applying testing corrections
A worst case attack will have experiment logger and baseline (dummy) experiments logger which is unique to worst case attack only.
attack_experiment_logger:
attack_instance_logger: Stores metrics computed across all iteration of attacks (i.e. n_reps)
instance_0:
TPR: value of true positive rate
FPR: value of false positive rate
...
...
n_pos_test_examples:
n_neg_test_examples:
instance_1:
... all metric values computed similar to instance_0
instance_n:
... n will be n_reps-1 representing iterations of attacks
dummy_attack_experiments_logger:
dummy_attack_metrics_experiment_0:
attack_instance_logger: stores metrics computed across all iteration of attacks (i.e. n_reps)
instance_0:
TPR: value of true positive rate
FPR: value of false positive rate
...
...
n_pos_test_examples:
n_neg_test_examples:
instance_1:
... all metric values computed similar to instance_0
instance_n: n will be n_reps-1 representing iterations of attacks
...
dummy_attack_metrics_experiment_1:
...
...
dummy_attack_metrics_experiment_n: n will be n_dummy_reps-1 representing iterations of attacks
...
Example JSON output for worst case attack is accessible from link
LiRA Attack
A LiRA attack will have the following components in a metadata component of JSON output.
metadata:
attack_name: Name of the attack
attack_params: Attack parameters
target_model: Name of the target model
target_model_params: Target model parameters
global_metric: The following global metrics are computed for attack repetitions
null_auc_3sd_range: A three standard deviation range from the mean for the observed p_value
AUC_sig: Significant AUC at given p value
PDIF_sig: Significant PDIF at given p value
A LiRA attack will have experiment logger with only one instance.
attack_experiment_logger:
attack_instance_logger: stores metrics computed across all iteration of attacks (i.e. n_reps)
instance_0: For a lira attack type, this will have a single instance
TPR: value of true positive rate
FPR: value of false positive rate
...
...
n_pos_test_examples:
n_neg_test_examples:
Example JSON output for LiRA attack is accessible from link