Extending and contributing to mofdscribe#

Implementing a new featurizer#

To implement a new featurizer, you typically need to create a new class that inherits from the MOFBaseFeaturizer. In this class, you need to implement three methods: featurize(), feature_labels() and citation().

The main featurization logic happens in featurize(). Your method should accept as input a Structure object and return a numpy.array. The mofdscribe.featurizers.base.MOFBaseFeaturizer.feature_labels() method should return a list of strings that describe the features returned by featurize(). The number of feature names should match the number of features returned by featurize() (i.e. the number of columns in the feature matrix). The citation() method should return a list of strings of BibTeX citations for the featurizer.

Generally, you also want to decorate your structure with the decorators operates_on_imolecule(), operates_on_molecule(), operates_on_structure(), operates_on_istructure(). This is relevant for featurizer that operate on the building blocks and must pass the input in the right form.

Implementing a new dataset#

Often, you may want to use the utilities of a StructureDataset and the integration with the splitters, but with your custom structures and labels and not the ones shipped with mofdscribe.

Note

Contribute your dataset

Once you wrapped your dataset in a StructureDataset, you can contribute it to mofdscribe by opening a pull request on the mofdscribe repository. We will be happy to include it in the next release.

This will make it easier for other researchers to build on top of your work and to compare their results with yours. We can then also use it to create benchmark tasks.

For this, you only need a folder with cif files (or any other format supported by pymatgen) and (optionally) a pandas.DataFrame with label, features, and additional information. For instance, you can provide pre-computed densities and hashes (but we will compute them on the first use if you do not provide them). In the simplest use case, you simply provide the filenames:

You can also build it from a folder