Metrics

Calculate metrics.

sacroml.metrics.auc_p_val(auc: float, n_pos: int, n_neg: int) → tuple[float, float][source]

Compute the p-value for a given AUC.

Parameters:

aucfloat: Observed AUC value
n_posint: Number of positive examples
n_negint: Number of negative examples

Returns:

auc_pfloat: p-value of observing an AUC > auc by chance
auc_stdfloat: standard deviation of the NULL AUC diustribution (mean = 0.5)

sacroml.metrics.get_metrics(y_pred_proba: ndarray, y_test: ndarray, permute_rows: bool = True) → dict[source]

Calculate metrics, including attacker advantage for MIA binary.

Implemented as Definition 4 on https://arxiv.org/pdf/1709.01604.pdf which is also implemented in tensorFlow-privacy https://github.com/tensorflow/privacy.

Parameters:

y_testnp.ndarray: Test data labels.
y_pred_probanp.ndarray of shape [x,2] and type float: Predicted probabilities.
permute_rowsbool, default True: Whether to permute arrays, see: https://github.com/AI-SDC/SACRO-ML/issues/106

Returns:

metricsdict: dictionary of metric values

Notes

Includes the following metrics:

True positive rate or recall (TPR).
False positive rate (FPR), proportion of negative examples incorrectly classified as positives.
False alarm rate (FAR), proportion of objects classified as positives that are incorrect, also known as false discovery rate.
True neagative rate (TNR).
Positive predictive value or precision (PPV).
Negative predictive value (NPV).
False neagative rate (FNR).
Accuracy (ACC).
F1 Score - harmonic mean of precision and recall.
Advantage.

sacroml.metrics.min_max_disc(y_true: ndarray, pred_probs: ndarray, x_prop: float = 0.1, log_p: bool = True) → tuple[float, float, float, float][source]

Return non-average-case methods for MIA attacks.

Considers actual frequency of membership amongst samples with highest- and lowest- assessed probability of membership. If an MIA method confidently asserts that 5% of samples are members and 5% of samples are not, but cannot tell for the remaining 90% of samples, then these metrics will flag this behaviour, but AUC/advantage may not. Since the difference may be noisy, a p-value against a null of independence of true membership and assessed membership probability (that is, membership probabilities are essentially random) is also used as a metric (using a usual Gaussian approximation to binomial). If the p-value is low and the frequency difference is high (>0.5) then the MIA attack is successful for some samples.

Parameters:

ynp.ndarray: true labels
ypnp.ndarray: probabilities of labels, or monotonic transformation of probabilities
xpropfloat: proportion of samples with highest- and lowest- probability of membership to be considered
logpbool: convert p-values to log(p).

Returns:

maxdfloat: frequency of y=1 amongst proportion xprop of individuals with highest assessed membership probability
mindfloat: frequency of y=1 amongst proportion xprop of individuals with lowest assessed membership probability
mmdfloat: difference between maxd and mind
pvalfloat: p-value or log-p value corresponding to mmd against null hypothesis that random variables corresponding to y and yp are independent.

Examples

>>> y = np.random.choice(2, 100)
>>> yp = np.random.rand(100)
>>> maxd, mind, mmd, pval = min_max_desc(y, yp, xprop=0.2, logp=True)