Extending SafeModel =================== Modular Design -------------- The safemodel package is an open source wrapper for common machine learning models. It is designed to be modular and can be extended for use with other models. Code comments should be in the numpydoc format so that they are rendered by the automatic sphinx documentation The main steps needed to implement a new model are: #. Copy the new_model_template.py #. Define a safer class inheriting SafeModel and the Basic (SkLearn) model #. Update the __init__ method with ignore_items and examine_separately items #. Add checks for any unusual data structures #. Override the fit() function #. Update Sphinx documentation #. Write pytests to confirm core functionality #. Include any optional helper functions Copy The Template ----------------- .. code-block:: shell cp new_model_template xgboost.py Define the Safer Class ---------------------- .. code-block:: python class SafeGradientBoosting(SafeModel, GradientBoostingClassifier): """Privacy protected GradientBoostingClassifier.""" Update rules.json file ---------------------- The rules.json file is used to define safe limits for pearameters. The file is written in JSON (JavaScript Object Notation) and can be extended. to define safe limits for parameters of newly implemented models. Update the __init__ method with paramnames, ignore_items, and examine_separately items -------------------------------------------------------------------------------------- Code for a new class needs to reflect is the contents of the list self.basemodel_paramnames. .. code-block:: python class SafeModelToMakeSafe(SafeModel, GradientBoostingClassifier): """Privacy protected XGBoost.""" def __init__(self, **kwargs: Any) -> None: """Creates model and applies constraints to params""" SafeModel.__init__(self) self.basemodel_paramnames=[ 'edit','this','list','to', 'contain','just','the','valid','parameters', 'for','the','class', 'you ','are','creating','a' 'safe','wrapper','version','of'] the_kwds=dict() for key,val in kwargs.items(): if key in self.basemodel_paramnames: the_kwds[key]=val ModelToMakeSafer.__init__(self, **the_kwds) self.model_type: str = "ModelToMakeSafer" super().preliminary_check(apply_constraints=True, verbose=True) self.ignore_items = [ "model_save_file", "ignore_items", "base_estimator_", "timestamp", ] self.examine_seperately_items = ["base_estimator", "estimators_"] For sklearn models this list can be extracted from the sklearn man page for the new model. For example, Saferandomforest defines the valid paramnames as: .. code-block:: python def __init__(self, **kwargs: Any) -> None: """Creates model and applies constraints to params""" SafeModel.__init__(self) self.basemodel_paramnames=[ 'n_estimators','criterion','max_depth','min_samples_split', 'min_samples_leaf','min_weight_fraction_leaf','max_features', 'max_leaf_nodes','min_impurity_decrease','bootstrap', 'oob_score','n_jobs','random_state','verbose' 'warm_start','class_weight','ccp_alpha','max_samples'] Add checks for any unusual data structures ------------------------------------------ Some models may have unusual datastructures. Care should be taken to ensure that these are not changed after the fit() method is called. Examples of unusual datastructures are: Lists are handled in the safemodel base class. Decision Trees handled in safedecisiontree.py and saferandomforest.py .. code-block:: python class SafeGradientBoosting(SafeModel, GradientBoostingClassifier): """Privacy protected GradientBoostingClassifier.""" Override the fit() function --------------------------- .. code-block:: python def fit(self, x: np.ndarray, y: np.ndarray) -> None: """Do fit and then store model dict""" super().fit(x, y) self.k_anonymity = self.get_k_anonymity(x) self.saved_model = copy.deepcopy(self.__dict__) Update Sphinx documentation ---------------------------- In the Sphinx docs/source directory make a copy of an existing .rst file it the .rst to reflect the newly implemented class. Then you must update the index.rst file by to include the new .rst file, although the extension is not required. E.g. saferandomforest links in saferandomforest.rst .. code-block:: shell cd docs cp saferandomforest.rst xgboost.rst edit xgboost.rst edit index.rst Write pytests to confirm core functionality -------------------------------------------- Write pytests to confirm the corefunctionality. Example test suites can be found in SACRO-ML/tests/ Include any optional helper functions ------------------------------------- Depending on the model being implemented one or more helper functions or methods may be required. .. code-block:: python def same_weights(m1: Any, m2: Any) -> Tuple[bool, str]: if len(m1.layers) != len(m2.layers): return False, "different numbers of layers" numlayers = len(m1.layers) for layer in range(numlayers): m1layer = m1.layers[layer].get_weights() m2layer = m2.layers[layer].get_weights() if len(m1layer) != len(m2layer): return False, f"layer {layer} not the same size." for dim in range(len(m1layer)): m1d = m2layer[dim] m2d = m2layer[dim] # print(type(m1d), m1d.shape) if not np.array_equal(m1d, m2d): return False, f"dimension {dim} of layer {layer} differs" return True, "weights match"