Extending SafeModel
Modular Design
The safemodel package is an open source wrapper for common machine learning models. It is designed to be modular and can be extended for use with other models. Code comments should be in the numpydoc format so that they are rendered by the automatic sphinx documentation
The main steps needed to implement a new model are:
Copy the new_model_template.py
Define a safer class inheriting SafeModel and the Basic (SkLearn) model
Update the __init__ method with ignore_items and examine_separately items
Add checks for any unusual data structures
Override the fit() function
Update Sphinx documentation
Write pytests to confirm core functionality
Include any optional helper functions
Copy The Template
cp new_model_template xgboost.py
Define the Safer Class
class SafeGradientBoosting(SafeModel, GradientBoostingClassifier):
"""Privacy protected GradientBoostingClassifier."""
Update rules.json file
The rules.json file is used to define safe limits for pearameters. The file is written in JSON (JavaScript Object Notation) and can be extended. to define safe limits for parameters of newly implemented models.
Update the __init__ method with paramnames, ignore_items, and examine_separately items
Code for a new class needs to reflect is the contents of the list self.basemodel_paramnames.
class SafeModelToMakeSafe(SafeModel, GradientBoostingClassifier):
"""Privacy protected XGBoost."""
def __init__(self, **kwargs: Any) -> None:
"""Creates model and applies constraints to params"""
SafeModel.__init__(self)
self.basemodel_paramnames=[
'edit','this','list','to',
'contain','just','the','valid','parameters',
'for','the','class',
'you ','are','creating','a'
'safe','wrapper','version','of']
the_kwds=dict()
for key,val in kwargs.items():
if key in self.basemodel_paramnames:
the_kwds[key]=val
ModelToMakeSafer.__init__(self, **the_kwds)
self.model_type: str = "ModelToMakeSafer"
super().preliminary_check(apply_constraints=True, verbose=True)
self.ignore_items = [
"model_save_file",
"ignore_items",
"base_estimator_",
"timestamp",
]
self.examine_seperately_items = ["base_estimator", "estimators_"]
For sklearn models this list can be extracted from the sklearn man page for the new model. For example, Saferandomforest defines the valid paramnames as:
def __init__(self, **kwargs: Any) -> None:
"""Creates model and applies constraints to params"""
SafeModel.__init__(self)
self.basemodel_paramnames=[
'n_estimators','criterion','max_depth','min_samples_split',
'min_samples_leaf','min_weight_fraction_leaf','max_features',
'max_leaf_nodes','min_impurity_decrease','bootstrap',
'oob_score','n_jobs','random_state','verbose'
'warm_start','class_weight','ccp_alpha','max_samples']
Add checks for any unusual data structures
Some models may have unusual datastructures. Care should be taken to ensure that these are not changed after the fit() method is called.
Examples of unusual datastructures are: Lists are handled in the safemodel base class. Decision Trees handled in safedecisiontree.py and saferandomforest.py
class SafeGradientBoosting(SafeModel, GradientBoostingClassifier):
"""Privacy protected GradientBoostingClassifier."""
Override the fit() function
def fit(self, x: np.ndarray, y: np.ndarray) -> None:
"""Do fit and then store model dict"""
super().fit(x, y)
self.k_anonymity = self.get_k_anonymity(x)
self.saved_model = copy.deepcopy(self.__dict__)
Update Sphinx documentation
In the Sphinx docs/source directory make a copy of an existing .rst file it the .rst to reflect the newly implemented class. Then you must update the index.rst file by to include the new .rst file, although the extension is not required. E.g. saferandomforest links in saferandomforest.rst
cd docs
cp saferandomforest.rst xgboost.rst
edit xgboost.rst
edit index.rst
Write pytests to confirm core functionality
Write pytests to confirm the corefunctionality. Example test suites can be found in SACRO-ML/tests/
Include any optional helper functions
Depending on the model being implemented one or more helper functions or methods may be required. For example there are may helpfunctions in safekeras.py that help with the the specifics of neural networks.
def same_weights(m1: Any, m2: Any) -> Tuple[bool, str]:
if len(m1.layers) != len(m2.layers):
return False, "different numbers of layers"
numlayers = len(m1.layers)
for layer in range(numlayers):
m1layer = m1.layers[layer].get_weights()
m2layer = m2.layers[layer].get_weights()
if len(m1layer) != len(m2layer):
return False, f"layer {layer} not the same size."
for dim in range(len(m1layer)):
m1d = m2layer[dim]
m2d = m2layer[dim]
# print(type(m1d), m1d.shape)
if not np.array_equal(m1d, m2d):
return False, f"dimension {dim} of layer {layer} differs"
return True, "weights match"