Extending SafeModel

Modular Design

The safemodel package is an open source wrapper for common machine learning models. It is designed to be modular and can be extended for use with other models. Code comments should be in the numpydoc format so that they are rendered by the automatic sphinx documentation

The main steps needed to implement a new model are:

  1. Copy the new_model_template.py

  2. Define a safer class inheriting SafeModel and the Basic (SkLearn) model

  3. Update the __init__ method with ignore_items and examine_separately items

  4. Add checks for any unusual data structures

  5. Override the fit() function

  6. Update Sphinx documentation

  7. Write pytests to confirm core functionality

  8. Include any optional helper functions

Copy The Template

cp new_model_template xgboost.py

Define the Safer Class

class SafeGradientBoosting(SafeModel, GradientBoostingClassifier):
        """Privacy protected GradientBoostingClassifier."""

Update rules.json file

The rules.json file is used to define safe limits for pearameters. The file is written in JSON (JavaScript Object Notation) and can be extended. to define safe limits for parameters of newly implemented models.

Update the __init__ method with paramnames, ignore_items, and examine_separately items

Code for a new class needs to reflect is the contents of the list self.basemodel_paramnames.

class SafeModelToMakeSafe(SafeModel, GradientBoostingClassifier):
        """Privacy protected XGBoost."""

def __init__(self, **kwargs: Any) -> None:
        """Creates model and applies constraints to params"""

        'you ','are','creating','a'

        for key,val in kwargs.items():
        if key in self.basemodel_paramnames:
        ModelToMakeSafer.__init__(self, **the_kwds)
        self.model_type: str = "ModelToMakeSafer"
        super().preliminary_check(apply_constraints=True, verbose=True)
        self.ignore_items = [
        self.examine_seperately_items = ["base_estimator", "estimators_"]

For sklearn models this list can be extracted from the sklearn man page for the new model. For example, Saferandomforest defines the valid paramnames as:

def __init__(self, **kwargs: Any) -> None:
        """Creates model and applies constraints to params"""

Add checks for any unusual data structures

Some models may have unusual datastructures. Care should be taken to ensure that these are not changed after the fit() method is called.

Examples of unusual datastructures are: Lists are handled in the safemodel base class. Decision Trees handled in safedecisiontree.py and saferandomforest.py

class SafeGradientBoosting(SafeModel, GradientBoostingClassifier):
        """Privacy protected GradientBoostingClassifier."""

Override the fit() function

def fit(self, x: np.ndarray, y: np.ndarray) -> None:
        """Do fit and then store model dict"""
        super().fit(x, y)
        self.k_anonymity = self.get_k_anonymity(x)
        self.saved_model = copy.deepcopy(self.__dict__)

Update Sphinx documentation

In the Sphinx docs/source directory make a copy of an existing .rst file it the .rst to reflect the newly implemented class. Then you must update the index.rst file by to include the new .rst file, although the extension is not required. E.g. saferandomforest links in saferandomforest.rst

cd docs
cp saferandomforest.rst xgboost.rst
edit xgboost.rst
edit index.rst

Write pytests to confirm core functionality

Write pytests to confirm the corefunctionality. Example test suites can be found in SACRO-ML/tests/

Include any optional helper functions

Depending on the model being implemented one or more helper functions or methods may be required. For example there are may helpfunctions in safekeras.py that help with the the specifics of neural networks.

def same_weights(m1: Any, m2: Any) -> Tuple[bool, str]:
if len(m1.layers) != len(m2.layers):
        return False, "different numbers of layers"
numlayers = len(m1.layers)
for layer in range(numlayers):
        m1layer = m1.layers[layer].get_weights()
        m2layer = m2.layers[layer].get_weights()
if len(m1layer) != len(m2layer):
    return False, f"layer {layer} not the same size."
for dim in range(len(m1layer)):
    m1d = m2layer[dim]
    m2d = m2layer[dim]
    # print(type(m1d), m1d.shape)
    if not np.array_equal(m1d, m2d):
        return False, f"dimension {dim} of layer {layer} differs"
    return True, "weights match"