Extending and contributing to mofdscribe#
Implementing a new featurizer#
To implement a new featurizer, you typically need to create a new class that inherits from the
MOFBaseFeaturizer. In this class, you need to implement three methods:
The main featurization logic happens in
Your method should accept as input a
Structure object and return a
mofdscribe.featurizers.base.MOFBaseFeaturizer.feature_labels() method should return a list of strings that describe the features returned by
featurize(). The number of feature names should match the number of features returned by
featurize() (i.e. the number of columns in the feature matrix). The
citation() method should return a list of strings of BibTeX citations for the featurizer.
Generally, you also want to decorate your structure with the
operates_on_istructure(). This is relevant for featurizer that operate on the building blocks and must pass the input in the right form.
Implementing a new dataset#
Often, you may want to use the utilities of a
StructureDataset and the integration with the splitters, but with your custom structures and labels and not the ones shipped with mofdscribe.
Contribute your dataset
Once you wrapped your dataset in a
StructureDataset, you can contribute it to mofdscribe by opening a pull request on the mofdscribe repository. We will be happy to include it in the next release.
This will make it easier for other researchers to build on top of your work and to compare their results with yours. We can then also use it to create benchmark tasks.
For this, you only need a folder with
cif files (or any other format supported by pymatgen) and (optionally) a
pandas.DataFrame with label, features, and additional information. For instance, you can provide pre-computed densities and hashes (but we will compute them on the first use if you do not provide them). In the simplest use case, you simply provide the filenames:
You can also build it from a folder