Featurizers#

Pore geometry featurization#

Computing pore properties using Zeo++.

class AccessibleVolume(probe_radius=0.1, num_samples=100, channel_radius=None, ha=True, primitive=True)[source]#

Initialize the AccessibleVolume featurizer.

Parameters:
  • probe_radius (Union[str, float]) – Radius of the probe. Defaults to 0.1.

  • num_samples (int) – Number of samples. Defaults to 100.

  • channel_radius (Union[str, float, None]) – Channel radius. Should equal to probe_radius. Defaults to None.

  • ha (bool) – if True, run zeo++ with the “high accuracy” -ha flag. It has been reported that this can lead to issues for some structures. Default is True.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') AccessibleVolume#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

class PoreDiameters(ha=True, primitive=True)[source]#

Calculate the pore diameters of a framework.

Initialize the featurizer.

Parameters:
  • ha (bool) – if True, run zeo++ with the “high accuracy” -ha flag. It has been reported that this can lead to issues for some structures. Default is True.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') PoreDiameters#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

class PoreSizeDistribution(probe_radius=0.0, num_samples=5000, channel_radius=None, hist_type='derivative', ha=False, primitive=True)[source]#

Describe structures using histograms of pore sizes.

The pore size distribution describes how much of the void space corresponds to certain pore sizes.

Pinheiro et al. (2013) concluded that they are “sensitive to small changes in pore diameter” but do “not reflect subtle changes in features such as the surface texture of a pore”.

We use the implementation in zeo++ to calculate the pore size distribution.

The pore size distribution has been used by the group of Gómez-Gualdrón as pore size standard deviation (PSSD) in, for example, 10.1021/acs.jctc.9b00940 and 10.1063/5.0048736.

Currently, the histogram is hard-coded to be of length 1000 between 0 and 100 Angstrom (in zeo++ itself).

Initialize the PoreSizeDistribution featurizer.

Parameters:
  • probe_radius (Union[str, float]) – Used to estimate the accessible volume. Only the accessible volume is then considered for the histogram. Defaults to 0.0.

  • num_samples (int) – Number of rays that are placed through sample. Original publication used 1,000,000 sample points for IZA zeolites and 100,000 sample points for hypothetical zeolites. Larger numbers increase the runtime. Defaults to 50000.

  • channel_radius (Union[str, float, None]) – Radius of a probe used to determine accessibility of the void space. Should typically equal the radius of the probe_radius. If set to None, we will use the probe_radius. Defaults to None.

  • hist_type (str) – Type of the histogram. Available options count, cumulative, derivative. (The derivative distribution describes the change in the cumulative distribution with respect to pore size). Defaults to “derivative”.

  • ha (bool) – if True, run zeo++ with the “high accuracy” -ha flag. It has been reported that this can lead to issues for some structures. Default is True.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

Raises:

ValueError – If type not one of ‘count’, ‘cumulative’, ‘derivative’.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') PoreSizeDistribution#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

class RayTracingHistogram(probe_radius=0.0, num_samples=50000, channel_radius=None, ha=True, primitive=True)[source]#

Describe pore structures using histograms of ray lengths.

The algorithm (implemented in zeo++) shoots random rays through the accesible volume of the cell until the ray hits atoms, and it records their lenghts to provide the corresponding histogram.

Such ray histograms are supposed to encode the shape, topology, distribution and size of voids.

Currently, the histogram is hard-coded to be of length 1000 (in zeo++ itself).

Initialize the RayTracingHistogram featurizer.

Parameters:
  • probe_radius (Union[str, float]) – Used to estimate the accessible volume. Only the accessible volume is then considered for the histogram. Defaults to 0.0.

  • num_samples (int) – Number of rays that are placed through sample. Original publication used 1,000,000 sample points for IZA zeolites and 100,000 sample points for hypothetical zeolites. Larger numbers increase the runtime Defaults to 50000.

  • channel_radius (Union[str, float, None]) – Radius of a probe used to determine accessibility of the void space. Should typically equal the radius of the probe_radius. If set to None, we will use the probe_radius. Defaults to None.

  • ha (bool) – if True, run zeo++ with the “high accuracy” -ha flag. It has been reported that this can lead to issues for some structures. Default is True.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') RayTracingHistogram#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

class SurfaceArea(probe_radius=0.1, num_samples=100, channel_radius=None, ha=True, primitive=True)[source]#

Initialize the SurfaceArea featurizer.

Parameters:
  • probe_radius (Union[str, float]) – Radius of the probe. Defaults to 0.1.

  • num_samples (int) – Number of samples. Defaults to 100.

  • channel_radius (Union[str, float, None]) – Channel radius. Should equal to probe_radius. Defaults to None.

  • ha (bool) – if True, run zeo++ with the “high accuracy” -ha flag. It has been reported that this can lead to issues for some structures. Default is True.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') SurfaceArea#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Voxelgrid#

Featurizer that computes 3D voxelgrids.

make_supercell(coords, elements, lattice, size, min_size=-5)[source]#

Generate cubic supercell of a given size.

Parameters:
  • coords (np.ndarray) – matrix of xyz coordinates of the system

  • elements (List[int]) – atomic numbers of every site

  • lattice (Tuple[float, float, float]) – lattice constants of the system

  • size (float) – dimension size of cubic cell, e.g., 10x10x10

  • min_size (float) – minimum axes size to keep negative xyz coordinates from the original cell. Defaults to -5.

Returns:

supercell array

Return type:

new_cell

class VoxelGrid(min_size=30, n_x=25, n_y=25, n_z=25, geometry_aggregations=('binary',), properties=('X', 'electron_affinity'), flatten=True, regular_bounding_box=True, primitive=False)[source]#

Describe the structure using a voxel grid.

For this, we first compute a supercell, the “voxelize” the point cloud.

For setting the value of the voxels, different options are available: Geometry Aggregations: - binary: 1 if the voxel is occupied, 0 otherwise - density: the number of atoms in the voxel / total number of atoms - TDF: truncated distance function. Value between 0 and 1 indicating the distance between the voxel’s center and the closest point. 1 on the surface, 0 on voxels further than 2 * voxel side.

Properties: Alternatively/additionally one can use the average of any available properties of pymatgen Element objects.

Initialize a VoxelGrid featurizer.

Parameters:
  • min_size (float) – Minimum supercell size in Angstrom. Defaults to 30.

  • n_x (int) – Number of bins in x direction (Hung et al used 30 and 60 at a cell size of 60). Defaults to 25.

  • n_y (int) – Number of bins in x direction (Hung et al used 30 and 60 at a cell size of 60). Defaults to 25.

  • n_z (int) – Number of bins in x direction (Hung et al used 30 and 60 at a cell size of 60). Defaults to 25.

  • geometry_aggregations (Union[Tuple["density" | "binary" | "TDF"], None]) – Mode for encoding the occupation of voxels. * binary: 0 for empty voxels, 1 for occupied. * density: number of points inside voxel / total number of points. * TDF: Truncated Distance Function. Value between 0 and 1 indicating the distance, between the voxel’s center and the closest point. 1 on the surface, 0 on voxels further than 2 * voxel side. Defaults to (“binary”,).

  • properties (Union[Tuple[str, int], None]) – Properties used for calculation of the AP-RDF. All properties of pymatgen.core.Species are available. Defaults to (“X”, “electron_affinity”).

  • flatten (bool) – It true, flatten the 3D voxelgrid to 1D array. Defaults to True.

  • regular_bounding_box (bool) – If True, the bounding box of the point cloud will be adjusted in order to have all the dimensions of equal length. Defaults to True.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') VoxelGrid#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Topological featurization (persistent homology)#

Note that under topology we understand here the embedding topology and not the one of the underlying structure graph (i.e. the connectivity).

Featurizers using persistent homology – vectorized using Gaussian mixture models.

class PHVect(atom_types=('C-H-N-O', 'F-Cl-Br-I', 'Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga-Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti-Hf-Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu'), compute_for_all_elements=True, dimensions=(1, 2), min_size=20, n_components=20, apply_umap=False, umap_n_components=2, umap_metric='hellinger', p=1, random_state=None, periodic=False, no_supercell=False, primitive=False, alpha_weight=None)[source]#

Vectorizer for Persistence Diagrams (PDs) using Gaussian mixture models.

The vectorization of a diagram is then the weighted maximum likelihood estimate of the mixture weights for the learned components given the diagram.

Importantly, the vectorizations can still be used to compute approximate Wasserstein distances.

Construct a PHVect instance.

Parameters:
  • atom_types (tuple) – Atoms that are used to create substructures that are analysed using persistent homology. If multiple atom types separated by hash are provided, e.g. “C-H-N-O”, then the substructure consists of all atoms of type C, H, N, or O. Defaults to ( “C-H-N-O”, “F-Cl-Br-I”, “Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga- Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti- Hf-Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu”, ).

  • compute_for_all_elements (bool) – If true, compute persistence images for full structure (i.e. with all elements). If false, it will only do it for the substructures specified with atom_types. Defaults to True.

  • dimensions (Tuple[int]) – Dimensions of topological features to consider for persistence images. Defaults to (1, 2).

  • min_size (int) – Minimum supercell size (in Angstrom). Defaults to 20.

  • n_components (int) – The number of components or dimensions to use in the vectorized representation. Defaults to 20.

  • apply_umap (bool) – Whether to apply UMAP to the results to generate a low dimensional Euclidean space representation of the diagrams. Defaults to False.

  • umap_n_components (int) – The number of dimensions of euclidean space to use when representing the diagrams via UMAP. Defaults to 2.

  • umap_metric (str) – What metric to use for the UMAP embedding if apply_umap is enabled (this option will be ignored if apply_umap is False). Should be one of: * "wasserstein" * "hellinger" Note that if "wasserstein" is used then transforming new data will not be possible. Defaults to “hellinger”.

  • p (int) – The p in the p-Wasserstein distance to compute.

  • random_state (_type_) – random state propagated to the Gaussian mixture models (and UMAP). Defaults to None.

  • periodic (bool) – If true, then periodic Euclidean is used in the analysis (experimental!). Defaults to False.

  • no_supercell (bool) – If true, then the supercell is not created. Defaults to False.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to False.

  • alpha_weight (Optional[str]) – If specified, the use weighted alpha shapes, i.e., replacing the points with balls of varying radii. For instance atomic_radius_calculated or van_der_waals_radius.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

fit(structures)[source]#

Update the parameters of this featurizer based on available data

Parameters:
  • tuples] (X - [list of) –

  • data (training) –

Return type:

PHVect

Returns:

self

fit_transform(structures)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

citations()[source]#

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') PHVect#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Implements persistent homology images.

class PHImage(atom_types=('C-H-N-O', 'F-Cl-Br-I', 'Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga-Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti-Hf-Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu'), dimensions=(0, 1, 2), compute_for_all_elements=True, min_size=20, image_size=(20, 20), spread=0.2, weight='identity', max_b=18, max_p=18, max_fit_tolerence=0.1, periodic=False, no_supercell=False, primitive=False, alpha_weight=None)[source]#

Vectorize persistent diagrams as image.

Adams et al. (2017) introduced a stable vector representation of persistent homology.

In persistent images, one replaces birth–persistence pairs (b, d – b) by a Gaussians (to spread its influence across the neighborhood, since nearby points represent features of similar size). Additionally, one multiplies with special weighting function such that

\[f(x, y)=w(y) \sum_{(b, d) \in D_{g}(P)} N((b, d-b), \sigma)\]

A common weighting function is the linear function \(w(y) = y\).

One application for porous materials has been reported by Aditi S. Krishnapriyan et al. (2017).

Typically, persistent images are computed for all atoms in the structure. However, one can also compute persistent images for a subset of atoms. This can be done by specifying the atom types in the constructor.

Construct a PHImage object.

Parameters:
  • atom_types (Tuple[str], optional) – Atoms that are used to create substructures that are analysed using persistent homology. If multiple atom types separated by hash are provided, e.g. “C-H-N-O”, then the substructure consists of all atoms of type C, H, N, or O. Defaults to ( “C-H-N-O”, “F-Cl-Br-I”, “Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba- Ce-K-Ga-Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb- Pr-Bi-Tm-Sr-Ti-Hf-Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu”).

  • dimensions (Tuple[int]) – Dimensions of topological features to consider for persistence images. Defaults to (0, 1, 2).

  • compute_for_all_elements (bool) – If true, compute persistence images for full structure (i.e. with all elements). If false, it will only do it for the substructures specified with atom_types. Defaults to True.

  • min_size (int) – Minimum supercell size (in Angstrom). Defaults to 20.

  • image_size (Tuple[int]) – Size of persistent image in pixel. Defaults to (20, 20).

  • spread (float) – “Smearing factor” for the Gaussians. Defaults to 0.2.

  • weight (str) – Weighting function for calculation of the persistence images. Defaults to “identity”.

  • max_b (Union[int, List[int]]) – Maximum birth time. Defaults to 18.

  • max_p (Union[int, List[int]]) – Maximum persistence. Defaults to 18.

  • max_fit_tolerence (float) – If fit method is used to find the limits of the persistent images, one can appy a tolerance on the the found limits. The maximum will then be max + max_fit_tolerance * max. Defaults to 0.1.

  • periodic (bool) – If true, then periodic Euclidean is used in the analysis (experimental!). Defaults to False.

  • no_supercell (bool) – If true, then the supercell is not created. Defaults to False.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to False.

  • alpha_weight (Optional[str]) – If specified, the use weighted alpha shapes, i.e., replacing the points with balls of varying radii. For instance atomic_radius_calculated or van_der_waals_radius.

Raises:

AssertionError – If the length of the max_b and max_p is not equal to the number of dimensions.

get_birth_persistance_death_from_pixel(dimension, x, y)[source]#

Get birth, persistence, and death from pixel coordinates.

Parameters:
  • dimension (int) – Dimension of the topological feature.

  • x (int) – x coordinate.

  • y (int) – y coordinate.

Returns:

Birth, persistence, and death.

Return type:

Tuple[float, float, float]

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') PHImage#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Featurizers using persistent homology – applied in an atom-centred manner.

class AtomCenteredPHSite(aggregation_functions=('min', 'max', 'mean', 'std'), cutoff=12, dimensions=(1, 2), alpha_weight=None)[source]#

Site featurizer for atom-centered statistics of persistence diagrams.

This featurizer is an abstraction of the on described in the work of Jiang et al. (2021) [Jiang2021]. It computes the persistence diagrams for the neighborhood of every site and then aggregates the diagrams by computing certain statistics.

To use this featurizer on a complete structure without additional resolutions over the chemistry, you can wrap it in a SiteStatsFingerprint.

Examples

>>> from matminer.featurizers.structure import SiteStatsFingerprint
>>> from mofdscribe.topology.atom_centered_ph import AtomCenteredPHSite
>>> featurizer = SiteStatsFingerprint(AtomCenteredPHSite())
>>> featurizer.featurize(structure)
np.array([2,...]) # np.array with features

However, if you want the additional chemical dimension, you can use the the AtomCenteredPH.

Construct a new AtomCenteredPHSite featurizer.

Parameters:
  • aggregation_functions (Tuple[str]) – Aggregations to compute on the persistence diagrams (over birth/death time and persistence). Defaults to (“min”, “max”, “mean”, “std”).

  • cutoff (float) – Consider neighbors of site within this radius (in Angstrom). Defaults to 12.

  • dimensions (Tuple[int]) – Betti numbers of consider. 0 describes isolated components, 1 cycles and 2 cavities. Defaults to (1, 2).

  • alpha_weight (Optional[str]) – If specified, the use weighted alpha shapes, i.e., replacing the points with balls of varying radii. For instance atomic_radius_calculated or van_der_waals_radius.

featurize(s, idx)[source]#

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Parameters:

x – input data to featurize (type depends on featurizer).

Return type:

ndarray

Returns:

(list) one or more features.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

class AtomCenteredPH(atom_types=('C-H-N-O', 'F-Cl-Br-I', 'Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga-Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti-Hf-Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu'), compute_for_all_elements=True, aggregation_functions=('min', 'max', 'mean', 'std'), species_aggregation_functions=('min', 'max', 'mean', 'std'), cutoff=12, dimensions=(1, 2), primitive=False, alpha_weight=None)[source]#

Atom-centered featurizer for persistence diagrams.

It runs AtomCenteredPH for every site.

It aggregates the results over atom types that are specified in the constructor via aggregation functions specified in the constructor.

Construct a new AtomCenteredPH featurizer.

Parameters:
  • atom_types (tuple) – Atoms that are used to create substructures that are analysed using persistent homology. If multiple atom types separated by hash are provided, e.g. “C-H-N-O”, then the substructure consists of all atoms of type C, H, N, or O. Defaults to ( “C-H-N-O”, “F-Cl-Br-I”, “Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga- “Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti-Hf-Ir- “Nb-Pd-Hg-Th-Np-Lu-Rh-Pu”, ).

  • compute_for_all_elements (bool) – Compute descriptor for original structure with all atoms. Defaults to True.

  • aggregation_functions (Tuple[str]) – Aggregations to compute on the persistence diagrams (over birth/death time and persistence). Defaults to (“min”, “max”, “mean”, “std”).

  • species_aggregation_functions (Tuple[str]) – Aggregations to use to combine features derived for sites of a specific atom type, e.g., the site features of all C-H-N-O. Defaults to (“min”, “max”, “mean”, “std”).

  • cutoff (float) – Consider neighbors of site within this radius (in Angstrom). Defaults to 12.

  • dimensions (Tuple[int]) – Betti numbers of consider. 0 describes isolated components, 1 cycles and 2 cavities. Defaults to (1, 2).

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to False.

  • alpha_weight (Optional[str]) – If specified, the use weighted alpha shapes, i.e., replacing the points with balls of varying radii. For instance atomic_radius_calculated or van_der_waals_radius.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') AtomCenteredPH#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Compute statistics of persistent images for MOFs.

class PHStats(atom_types=('C-H-N-O', 'F-Cl-Br-I', 'Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga-Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti-Hf-Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu'), compute_for_all_elements=True, dimensions=(1, 2), min_size=20, aggregation_functions=('min', 'max', 'mean', 'std'), periodic=False, no_supercell=False, primitive=False, alpha_weight=None)[source]#

Featurizer that computes statistics of persistent images.

Compute a fixed-length vector of topological descriptors for a structure by summarizing the persistence diagrams of the structure (or substructure) using aggegrations such as min, max, mean, and std.

The descriptors can be computed over the full structure or substructures of certain atom types.

Initialize the PHStats object.

Parameters:
  • atom_types (tuple) – Atoms that are used to create substructures for which the persistent homology statistics are computed. Defaults to ( “C-H-N-O”, “F-Cl-Br-I”, “Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga- Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti- Hf-Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu”, ).

  • compute_for_all_elements (bool) – Compute descriptor for original structure with all atoms. Defaults to True.

  • dimensions (Tuple[int]) – Dimensions of topological features to consider. Defaults to (1, 2).

  • min_size (int) – Minimum supercell size (in Angstrom). Defaults to 20.

  • aggregation_functions (Tuple[str]) – Methods used to combine the properties. See mofdscribe.featurizers.utils.aggregators.ARRAY_AGGREGATORS for available options. Defaults to (“min”, “max”, “mean”, “std”).

  • periodic (bool) – If true, then periodic Euclidean is used in the analysis (experimental!). Defaults to False.

  • no_supercell (bool) – If true, then the supercell is not created. Defaults to False.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to False.

  • alpha_weight (Optional[str]) – If specified, the use weighted alpha shapes, i.e., replacing the points with balls of varying radii. For instance atomic_radius_calculated or van_der_waals_radius.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') PHStats#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Compute histograms of persistent images for MOFs.

class PHHist(atom_types=('C-H-N-O', 'F-Cl-Br-I', 'Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga-Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti-Hf-Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu'), compute_for_all_elements=True, dimensions=(1, 2), min_size=20, nx=10, ny=10, periodic=False, y_axis_label='persistence', normed=True, no_supercell=False, primitive=False, alpha_weight=None)[source]#

Featurizer that computes 2D histogram of persistent images.

Compute a fixed-length vector of topological descriptors for a structure by summarizing the persistence diagrams of the structure (or substructure) usimg a 2D histogram.

The descriptors can be computed over the full structure or substructures of certain atom types.

Initialize the PHStats object.

Parameters:
  • atom_types (tuple) – Atoms that are used to create substructures for which the persistent homology statistics are computed. Defaults to ( “C-H-N-O”, “F-Cl-Br-I”, “Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga- Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti- Hf-Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu”, ).

  • compute_for_all_elements (bool) – Compute descriptor for original structure with all atoms. Defaults to True.

  • dimensions (Tuple[int]) – Dimensions of topological features to consider. Defaults to (1, 2).

  • min_size (int) – Minimum supercell size (in Angstrom). Defaults to 20.

  • nx (int) – Number of points in the x-direction. Defaults to 10.

  • ny (int) – Number of points in the y-direction. Defaults to 10.

  • periodic (bool) – If true, then periodic Euclidean is used in the analysis (experimental!). Defaults to False.

  • y_axis_label (str) – Label for the y-axis. Can be “death” or “persistence”. Defaults to “persistence”.

  • normed (bool) – If true, then the histogram is normalized. Defaults to True.

  • no_supercell (bool) – If true, then the supercell is not created. Defaults to False.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to False.

  • alpha_weight (Optional[str]) – If specified, the use weighted alpha shapes, i.e., replacing the points with balls of varying radii. For instance atomic_radius_calculated or van_der_waals_radius.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') PHHist#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Chemistry-centred featurization#

Revised autocorrelation functions (RACs) for MOFs.

class RACS(attributes=('X', 'mod_pettifor', 'I', 'T'), scopes=(1, 2, 3), prop_agg=('product', 'diff'), corr_agg=('sum', 'avg'), bb_agg=('avg', 'sum'), bond_heuristic='vesta', bbs=('linker_all', 'linker_connecting', 'linker_functional', 'linker_scaffold', 'nodes'), primitive=True)[source]#

Modified version of the revised autocorrelation functions (RACs) for MOFs.

Originally proposed by Moosavi et al. (10.1038/s41467-020-17755-8). In the original paper, RACs were computed as

\[{}_{{\rm{scope}}}^{{\rm{start}}}{P}_{d}^{{\rm{diff}}}=\mathop{\sum }\limits_{i}^{{\rm{start}}}\mathop{\sum }\limits_{j}^{{\rm{scope}}}({P}_{i}-{P}_{j})\delta ({d}_{i,j},d). # noqa: E501 \]

Here, we allow to replace the double sum by different aggregations. We call this corr_agg. The default sum is equivalent to the original RACS. Moreover, the implementation here keeps track of different linker/node molecules and allows to compute and aggregate the RACS for each molecule separately. The bb_agg feature then determines how those RACs for each BB are aggregated. The sum is equivalent to the original RACS (i.e. all applicable linker atoms would be added to the start/scope lists).

Note that the “bbs” define the start atoms. However, we still consider the whole neighborhood of each start atom (i.e., including atoms that are not part of this “bb”).

Furthermore, here we allow to allow any of the element-coder properties to be used as property \(P_{i}\).

To use to original implementation, see molSimplify.

Initialize the RACS featurizer.

Parameters:
  • attributes (Tuple[Union[int, str]]) – Properties that are correlated. Defaults to (“X”, “mod_pettifor”, “I”, “T”).

  • scopes (Tuple[int]) – Number of edges to traverse. Defaults to (1, 2, 3).

  • prop_agg (Tuple[str]) – Function for aggregating the properties. Defaults to (“product”, “diff”).

  • corr_agg (Tuple[str]) – Function to aggregate the properties across different start/scopes. Defaults to (“sum”, “avg”).

  • bb_agg (Tuple[str]) – Function used to aggregate the properties across different building blocks. Defaults to (“avg”, “sum”).

  • bond_heuristic (str) – Method used to guess bonds. Defaults to “vesta”.

  • bbs (Tuple[str]) – Building blocks to use. Defaults to (“linker_all”, “linker_connecting”, “linker_functional”, “linker_scaffold”, “nodes”).

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') RACS#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Atomic-property weighted autocorrelation function.

See alternative implementation https://github.com/tomdburns/AP-RDF (likely faster as it also has a lower-level implementation)

class APRDF(cutoff=20.0, lower_lim=2.0, bin_size=0.25, b_smear=10, properties=('X', 'electron_affinity'), aggregations=('avg', 'product', 'diff'), normalize=False, primitive=True)[source]#

Generalization of descriptor described by Fernandez et al..

In the article they describe the product of atomic properties as weighting of a “conventional” radiual distribution function “RDF”.

\[\operatorname{RDF}^{p}(R)=f \sum_{i, j}^{\text {all atom puirs }} P_{i} P_{j} \mathrm{e}^{-B\left(r_{ij}-R\right)^{2}} # noqa: E501 \]

Here, we allow for a wider choice of option for aggregation of properties \(P_{i}\) and \(P_{j}\) (not only the product).

Examples

>>> from mofdscribe.chemistry.aprdf import APRDF
>>> from pymatgen.core.structure import Structure
>>> s = Structure.from_file("tests/test_files/LiTiO3.cif")
>>> aprdf = APRDF()
>>> features = aprdf.featurize(s)

Set up an atomic property (AP) weighted radial distribution function.

Parameters:
  • cutoff (float) – Consider neighbors up to this value (in Angstrom). Defaults to 20.0.

  • lower_lim (float) – Lowest distance (in Angstrom) to consider. Defaults to 2.0.

  • bin_size (float) – Bin size for binning. Defaults to 0.25.

  • b_smear (Union[float, None]) – Band width for Gaussian smearing. If None, the unsmeared histogram is used. Defaults to 10.

  • properties (Tuple[str, int]) – Properties used for calculation of the AP-RDF. All properties of pymatgen.core.Species are available in addition to the integer 1 that will set P_i=P_j=1. Defaults to (“X”, “electron_affinity”).

  • aggregations (Tuple[str]) – Methods used to combine the properties. See mofdscribe.featurizers.utils.aggregators.AGGREGATORS for available options. Defaults to (“avg”, “product”, “diff”).

  • normalize (bool) – If True, the histogram is normalized by dividing by the number of atoms. Defaults to False.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

precheck()[source]#

Precheck (provide an estimate of whether a featurizer will work or not) for a single entry (e.g., a single composition). If the entry fails the precheck, it will most likely fail featurization; if it passes, it is likely (but not guaranteed) to featurize correctly.

Prechecks should be:
  • accurate (but can be good estimates rather than ground truth)

  • fast to evaluate

  • unlikely to be obsolete via changes in the featurizer in the near

    future

This method should be overridden by any featurizer requiring its use, as by default all entries will pass prechecking. Also, precheck is a good opportunity to throw warnings about long runtimes (e.g., doing nearest neighbors computations on a structure with many thousand sites).

See the documentation for precheck_dataframe for more information.

Parameters:

*x (Composition, Structure, etc.) – Input to-be-featurized. Can be a single input or multiple inputs.

Returns:

True, if passes the precheck. False, if fails.

Return type:

(bool)

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') APRDF#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Partial charge statistics featurizer.

class PartialChargeStats(aggregations=('max', 'min', 'std'), primitive=True)[source]#

Compute partial charges using the EqEq charge equilibration method [Ongari2019].

Then derive a fix-length feature vector from the partial charges using aggregative statistics.

They have, for instance, been used as “maximum positive charge” and “mininum negative charge” in Moosavi et al. (2020)

Initialize the PartialChargeStats featurizer.

Parameters:
  • aggregations (Tuple[str]) – Aggregations to compute. For available methods, see mofdscribe.featurizers.utils.aggregators.ARRAY_AGGREGATORS. Defaults to (“max”, “min”, “std”).

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') PartialChargeStats#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Partial charge histogram featurizer.

class PartialChargeHistogram(min_charge=-4, max_charge=4, bin_size=0.5, primitive=False)[source]#

Compute partial charges using the EqEq charge equilibration method [Ongari2019].

Then derive a fix-length feature vector from the partial charges by binning charges in a histogram.

Construct a new PartialChargeHistogram featurizer.

Parameters:
  • min_charge (float) – Minimum limit of bin grid. Defaults to -4.

  • max_charge (float) – Maximum limit of bin grid. Defaults to 4.

  • bin_size (float) – Bin size. Defaults to 0.5.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') PartialChargeHistogram#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Featurizer that runs RASPA to calculate energy grids.

class EnergyGridHistogram(raspa_dir=None, grid_spacing=1.0, bin_size_vdw=1, min_energy_vdw=-40, max_energy_vdw=0, cutoff=12, mof_ff='UFF', mol_ff='TraPPE', mol_name='CO2', sites=('C_co2',), tail_corrections=True, mixing_rule='Lorentz-Berthelot', shifted=False, separate_interactions=True, run_eqeq=True, primitive=False)[source]#

Computes the energy grid histograms as originally proposed by Bucior et al. [Bucior2019].

Conventionally, energy grids can be used to speed up molecular simulations. The idea is that the interactions between the guest and host are pre-computed on a fine grid and then only need to be looked up (instead of re-computed all the time). Bucior et al. proposed (effectively) a dimensionality reduction of the energy grid by making a histogram of the energies. For H2 they used bins of a width of 1 kJ mol−1 ranging from −10 kJ mol−1 (attractive) to 0 kJ mol−1. For methane, they used bins in 2 kJ mol−1 increments between −26 and 0 kJ mol−1, again with a repulsion bin.

This approach has also been used, for example, Li et al. (2021) and [Bucior2021].

References

[Bucior2021]

Li, Z.; Bucior, B. J.; Chen, H.; Haranczyk, M.; Siepmann, J. I.; Snurr, R. Q. Machine Learning Using Host/Guest Energy Histograms to Predict Adsorption in Metal–Organic Frameworks: Application to Short Alkanes and Xe/Kr Mixtures. J. Chem. Phys. 2021, 155 (1), 014701. https://doi.org/10.1063/5.0050823.

Construct the EnergyGridHistogram class.

Parameters:
  • raspa_dir (Union[str, Path, None]) – Path to the raspa directory (with lib, bin, share) subdirectories. If None we will look for the RASPA_DIR environment variable. Defaults to None.

  • grid_spacing (float) – Spacing for the energy grids. Bucior et al. (2018) used 1.0 A. Defaults to 1.0.

  • bin_size_vdw (float) – Size of bins for the energy histogram. Defaults to 1.

  • min_energy_vdw (float) – Minimum energy for the energy histogram (defining start of first bin). Defaults to -10.

  • max_energy_vdw (float) – Maximum energy for energy histogram (defining last bin). Defaults to 0.

  • cutoff (float) – Cutoff for the Van-der-Waals interaction. Defaults to 12.

  • mof_ff (str) – Name of the forcefield used for the framework. Defaults to “UFF”.

  • mol_ff (str) – Name of the forcefield used for the guest molecule. Defaults to “TraPPE”.

  • mol_name (str) – Name of the guest molecule. Defaults to “CO2”.

  • sites (Tuple[str]) – Name of the Van-der-Waals sites for which the energy diagrams are computed. Defaults to (“C_co2”,).

  • tail_corrections (bool) – If true, use analytical tail-correction for the contribution of the interaction potential after the cutoff. Defaults to True.

  • mixing_rule (str) – Mixing rule for framework and guest molecule force field. Available options are Jorgenson and Lorentz-Berthelot. Defaults to “Lorentz-Berthelot”.

  • shifted (bool) – If true, shifts the potential to equal to zero at the cutoff. Defaults to False.

  • separate_interactions (bool) – If True use framework’s force field for framework-molecule interactions. Defaults to True.

  • run_eqeq (bool) – If true, runs EqEq to compute charges. Defaults to True.

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

Raises:

ValueError – If the raspa_dir is not a valid directory.

fit_transform(structures)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

fit(structure)[source]#

Update the parameters of this featurizer based on available data

Parameters:
  • tuples] (X - [list of) –

  • data (training) –

Returns:

self

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structure: bool | None | str = '$UNCHANGED$') EnergyGridHistogram#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structure (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structure parameter in fit.

Returns:

self – The updated object.

Return type:

object

Generalized average minimum distance (AMD) featurizer.

class AMD(k=100, atom_types=('C-H-N-O', 'F-Cl-Br-I', 'Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga-Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti-Hf-Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu'), compute_for_all_elements=True, aggregations=('mean',), primitive=True)[source]#

Implements the generalized average minimum distance (AMD) isometry invariant [Widdowson2022].

Note that it currently does not implement averages according to multiplicity of sites (as the original code supports). The generalization is to other aggregations of the PDD.

The AMD is the average of the point-wise distance distribution (PDD) of a crystal. The PDD lists distances to neighbouring atoms in order, closest first. Hence, the kth AMD value corresponds to the average distance to the kth nearest neighbour.

The descriptors can be computed over the full structure or substructures of certain atom types.

Initialize the AMD descriptor.

Parameters:
  • k (int) – controls the number of nearest neighbour atoms considered for each atom in the unit cell. Defaults to 100.

  • atom_types (tuple) – Atoms that are used to create substructures for which the AMD descriptor is computed. Defaults to ( ‘C-H-N-O’, ‘F-Cl-Br-I’, ‘Cu-Mn-Ni-Mo-Fe-Pt-Zn-Ca-Er-Au-Cd-Co-Gd-Na-Sm-Eu-Tb-V-Ag-Nd-U-Ba-Ce-K-Ga- Cr-Al-Li-Sc-Ru-In-Mg-Zr-Dy-W-Yb-Y-Ho-Re-Be-Rb-La-Sn-Cs-Pb-Pr-Bi-Tm-Sr-Ti-Hf- Ir-Nb-Pd-Hg-Th-Np-Lu-Rh-Pu’, ).

  • compute_for_all_elements (bool) – If True, compute the AMD descriptor for the original structure with all elements. Defaults to True.

  • aggregations (tuple) – Aggregations of the AMD descriptor. The ‘mean’ is equivalent to the original AMD. Defaults to (‘mean’,).

  • primitive (bool) – If True, the structure is reduced to its primitive form before the descriptor is computed. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

citations()[source]#

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, structures: bool | None | str = '$UNCHANGED$') AMD#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

BU-centred featurization#

Measure the RMSD between a building block and topological prototypes.

class BUMatch(allow_rescale=True, mismatch_fill_value=1000, return_only_best=True, aggregations=('max', 'min', 'mean', 'std'), topos=('ukc', 'urv-a', 'pte', 'ukt', 'ptr', 'raf', 'tfi-a', 'bda', 'ecc', 'ect', 'bpd', 'yfi', 'whz', 'wka', 'wje', 'cds-a', 'wkv', 'vvi-a', 'whm', 'bps', 'etj', 'brl', 'wjr', 'wii', 'urv', 'wgb', 'mot-e', 'wff', 'rna-a', 'wgu', 'urm-a', 'pcl', 'wfq', 'xbv', 'tta', 'xam', 'pyr-a', 'hyx-a', 'ssc', 'vma', 'ttv', 'xaz', 'xba', 'myd', 'sst', 'hey', 'vmv', 'vby', 'vab', 'zxc-a', 'vcj', 'jsa', 'cdl-e', 'vbn', 'siz', 'zbt', 'bcu-k', 'ske', 'atn', 'mab', 'zbc', 'fsl', 'sod-h', 'hst', 'xww', 'jjt', 'iac-e', 'tch', 'iac-d', 'fsi-a', 'tci', 'tnn', 'ocu-a', 'fsm', 'fqr', 'zbb', 'ato', 'xat-a', 'bcu-j', 'skd', 'pcu-g-e', 'dia', 'zbu', 'vbo', 'vck', 'txt', 'idp-a', 'vbx', 'abb', 'hex', 'spn', 'vmw', 'ttw', 'gpu-x', 'fit', 'ssb', 'ana', 'xal', 'wfp', 'wgt', 'gah', 'wfg', 'nku', 'hmc-a', 'wgc', 'kwh', 'kts', 'etk', 'urw', 'wih', 'wjs', 'bpr', 'whl', 'wkw', 'wjd', 'yfh', 'bpe', 'ecu', 'icc-a', 'xby-a', 'ecb', 'pyc', 'rag', 'crs-d', 'pts', 'uku', 'gwe', 'ptd', 'ukb', 'gwg', 'kag', 'ukw', 'kms', 'rae', 'wjf', 'icm', 'bpg', 'why', 'yfj', 'usf', 'wkb', 'ngc', 'wjq', 'wij', 'uru', 'eti', 'kxe', 'kea-e', 'wku', 'whn', 'bpp', 'wfe', 'wga', 'wei', 'wfr', 'ssp-a', 'sit-a', 'wgv', 'vmb', 'ith-d', 'srd', 'xbu', 'xan', 'reo-d', 'ctt-a', 'ant', 'fau-e', 'vmu', 'svi-x', 'spl', 'toc-e', 'fiv', 'mmm-a', 'stx-a', 'srs', 'ttu', 'xay', 'xbb', 'feb', 'vbm', 'lcv-a', 'jsb', 'acs', 'fmj-a', 'xmz', 'cbo-e', 'vci', 'skf', 'fit-a', 'fsx', 'bcu-h', 'zbw', 'yav-d', 'coe', 'qrf', 'siy', 'fso', 'zcd', 'cor', 'sin', 'can', 'tbo', 'sst-e', 'tck', 'tcj', 'llw-z', 'fon-a', 'cbt', 'cao', 'mmt', 'ccg', 'dea', 'scu-a', 'mok', 'ott', 'zba', 'clh', 'aww', 'fsn', 'cml', 'cnw', 'vvi', 'zce', 'maw', 'tpt-a', 'zbv', 'six', 'fsy', 'skg', 'bcu-i', 'fet', 'nbo-x', 'jsc', 'vch', 'dnb-a', 'vbl', 'jst', 'mur', 'bto-e', 'ctn', 'oab', 'xbc', 'xax', 'ttt', 'vmt', 'reo-e', 'amn', 'nas-d', 'xao', 'ttc', 'xbt', 'atn-e', 'vmc', 'ssa', 'ith-e', 'ilc', 'wgw', 'cdj-a', 'wfs', 'pcn', 'tfk-a', 'lqm', 'wfd', 'who', 'wkt', 'urt', 'wik', 'wjp', 'cds-t', 'eth', 'tsb-a', 'wkc', 'whx', 'yfk', 'bpf', 'wjg', 'poz', 'urt-a', 'uro-a', 'ecv', 'rad', 'eca', 'nsp', 'ukv', 'jea-a', 'tfp-a', 'apo-a', 'wse', 'uka', 'ptt', 'ukr', 'pts-x', 'uke', 'bet', 'ecr', 'dnf-a', 'kle', 'pzh', 'pyo-a', 'sxc-d', 'npo', 'ece', 'shb-a', 'pyd', 'itv', 'snv-x', 'geo-a', 'etl', 'wjt', 'urp', 'wio', 'wkp', 'whk', 'yfx', 'nfb', 'bsn', 'bpu', 'urg', 'wix', 'wjc', 'ttu-a', 'bpb', 'yfo', 'wkg', 'wfw', 'wgs', 'gao', 'phx-a', 'asf-a', 'edc-a', 'wgd', 'kwo', 'vmp', 'ttp', 'xbg', 'fjh', 'ith-a', 'sse', 'vmg', 'xbp', 'xak', 'ttg', 'sra', 'qnf-a', 'gfy-a', 'vbh', 'abr', 'urk-a', 'sqp-a', 'vcl', 'fdc', 'zul', 'zca', 'zbe', 'cnd', 'skc', 'fry', 'urp-a', 'toz', 'yav-a', 'zbr', 'awd', 'seh', 'tcn', 'moo', 'zmj', 'znp', 'mon', 'ccu', 'xux', 'tco', 'sst-a', 'frx', 'yav-d-a', 'zbs', 'tmd', 'bcu-l', 'fit-e', 'skb', 'reo-d-a', 'zbd', 'zyl-a', 'sku', 'fsk', 'vcz', 'fdb', 'vcm', 'ffj', 'lcv-e', 'fdu', 'xlz', 'vbi', 'ttf', 'xaj', 'xbq', 'ssd', 'vmf', 'myc', 'xbf', 'toc-a', 'sss', 'xcb', 'vmq', 'wge', 'wfa', 'wgr', 'gan', 'etb-e', 'xai-a', 'wfv', 'wkf', 'yfn', 'bpc', 'cds-f', 'ydq', 'wjb', 'wiy', 'urf', 'bpt', 'nfc', 'whj', 'yfy', 'wkq', 'cab-a', 'etm', 'rtw', 'win', 'urq', 'wju', 'brk', 'raa', 'beb', 'ecd', 'pyr', 'ecs', 'thp-a', 'fff-a', 'ukd', 'uks', 'gui', 'ukf', 'nia-a-d', 'ecq', 'ste-a', 'lil', 'rac', 'ecf', 'lhh', 'wks', 'whh', 'nfa', 'qzd-a', 'bpv', 'wjw', 'urs', 'wil', 'eto', 'iac', 'ffd-a', 'bpa', 'yfl', 'wkd', 'hey-a', 'ild', 'wgp', 'wft', 'pbz', 'nju', 'qom-a', 'wfc', 'alb-x-d', 'hwx-a', 'xbd', 'alm', 'mzz', 'vms', 'srb', 'xbs', 'zyl', 'xah', 'ttd', 'vmd', 'cuz', 'ssf', 'jsd', 'mvy', 'vco', 'vbk', 'fgl', 'jph', 'nbo-h', 'brl-a', 'vcx', 'abf', 'cot', 'zbf', 'fsi', 'ocu-e', 'xbf-a', 'zcb', 'zbq', 'frz', 'epz-a', 'deh-d', 'mco', 'tcm', 'sgt', 'ahr-a', 'hqy', 'dgn', 'cbs', 'ntx-a', 'scu-f', 'cbd', 'ddc', 'scu-g', 'dgo', 'hqx', 'uri-a', 'cai', 'moz', 'tcl', 'iac-a', 'mco-a', 'ska', 'mcn', 'zbp', 'qnd-a', 'fsh', 'bcu-x', 'zcc', 'zbg', 'tfv-a', 'frl', 'fda', 'fee', 'vcy', 'vbj', 'xly', 'alb-x', 'cdl-a', 'lcv-f', 'qdl', 'jse', 'ffi', 'vcn', 'act', 'reo-t', 'vme', 'pyr-e', 'src', 'xai', 'xbr', 'vmr', 'xca', 'afi-e', 'fiq', 'xbe', 'wfb', 'mot-a', 'wgf', 'wfu', 'dnd-a', 'ile', 'wgq', 'nou-a', 'wja', 'wiz', 'ure', 'odf-d', 'rug', 'wke', 'yfm', 'wim', 'wjv', 'iab', 'etn', 'bpw', 'yfz', 'whi', 'wkr', 'ecg', 'lim', 'pyf', 'rab', 'ecp', 'xly-a', 'ukg', 'crs-a', 'ukp', 'rkf', 'kgp', 'boe', 'unl', 'umw', 'rin', 'bor', 'uoh', 'bay', 'ntu', 'bbb', 'isw', 'bcf', 'ctn-a', 'ban', 'ntb', 'bcq', 'ffc-a', 'wny', 'wmb', 'nck', 'nbo', 'nat', 'yan', 'utb', 'wlf', 'epv', 'wmu', 'wnn', 'wlq', 'bto', 'bwt', 'mml-a', 'eri', 'sty-a', 'bzd', 'ltw', 'noh', 'gfy', 'qtz-f', 'foa', 'std', 'ofp', 'tsf', 'xfj', 'fne', 'swh', 'ddy-a', 'tru', 'tsq', 'aht', 'csk', 'dum', 'fnr', 'svl', 'sxg', 'xbz-a', 'fcb', 'mss', 'jwj', 'fcu', 'sxp', 'msd', 'fvc', 'smf', 'pbp-l', 'zfh', 'cha', 'hyc', 'ftk', 'son', 'xba-a', 'smq', 'hxg', 'fvt', 'dnp', 'fuo', 'thl', 'tfg', 'osf', 'zhc', 'tfp', 'mgc-x', 'lcs-e', 'sas', 'urn-a', 'cdt', 'tfq', 'tej', 'bnn-a', 'tsx-a', 'tff', 'fun', 'dnq', 'fvu', 'jcy', 'dmj', 'smp', 'snk', 'soo', 'ftj', 'tfq-a', 'hyb', 'smg', 'fvb', 'fuy', 'apd', 'zfi', 'sxq', 'ant-a', 'fct', 'afi', 'tbo-a', 'tfj-a', 'sxf', 'csj', 'off', 'zjz-a', 'svm', 'fns', 'qom', 'swi', 'twf-a', 'xfk', 'tsg', 'fnd', 'svz', 'ste', 'uru-a', 'dnc-a', 'qtz-g', 'lta', 'wbl', 'noi', 'gao-a', 'nlr', 'nab', 'wlp', 'epw', 'phi', 'wno', 'wmt', 'wlg', 'utc', 'ncj', 'ato-e', 'wmc', 'wnx', 'dia-j', 'isa', 'bcp', 'bao', 'eea', 'asc-a', 'bcg', 'bbc', 'bax', 'rgd', 'gat-a', 'uoi', 'umv', 'unm', 'key', 'gpu', 'kgq', 'bod', 'law', 'uma', 'ken', 'ulg', 'nku-a', 'kgs', 'umc', 'unx', 'uno', 'umt', 'bmn', 'bce', 'she-a', 'baz', 'ntv', 'bba', 'dna-a', 'isc', 'lng-a', 'edp', 'bbv', 'bam', 'nta', 'wys', 'coe-e', 'uta', 'wle', 'wnz', 'wma', 'nch', 'wlr', 'wmv', 'wnm', 'epu', 'gea', 'nok', 'cdm-e', 'qtz-e', 'svx', 'fnf', 'tse', 'xfi', 'twf-t', 'odl', 'fly', 'fob', 'fnq', 'fmj', 'svo', 'xda', 'swk', 'stp', 'rht-x', 'naz-x', 'msp', 'mou-e', 'sxd', 'fca', 'url-a', 'mrc', 'sur-a', 'wxs-a', 'msg', 'fcv', 'zfk', 'chb', 'dnd', 'sme', 'urw-a', 'hyw', 'mon-e', 'kag-d', 'sni', 'smr', 'dmh', 'fvw', 'dns', 'ful', 'tsa-a', 'fth', 'som', 'jjt-a', 'gar-e', 'cda', 'baz-a', 'htp', 'tfd', 'lcs-f', 'bnn-t', 'ose', 'tfs', 'tei', 'osd', 'mjd', 'sss-a', 'ced', 'tfe', 'zrc-a', 'sol', 'fti', 'mgc', 'fum', 'dnr', 'fvv', 'sms', 'snh', 'zfj', 'smd', 'fva', 'fuz', 'mgz-x', 'dne', 'msf', 'msq', 'hms', 'sxe', 'lon-a', 'twf-d-a', 'swj', 'svn', 'fnp', 'csi', 'tph', 'odm', 'hbk', 'fng', 'ths-e', 'svy', 'xfh', 'tsd', 'pem', 'noj', 'pbz-e', 'wnl', 'wmw', 'wls', 'nci', 'cua-d', 'wld', 'edq', 'htp-a', 'bal', 'bbw', 'bcs', 'reo', 'ntw', 'loh', 'bcd', 'isu', 'umu', 'unn', 'uoj', 'mmn-a', 'kem', 'uny', 'umb', 'ulf', 'kgr', 'kga', 'hbk-a', 'uon', 'unj', 'umq', 'lcx', 'gsi', 'uoy', 'umf', 'isf', 'bcw', 'bbs', 'bah', 'nty-a', 'khn', 'isq', 'she-d', 'wzz', 'bbd', 'wms', 'wnh', 'wmd', 'ncm', 'bva', 'ggl', 'noy', 'ahr', 'fnt', 'svj', 'stu', 'hbo-e', 'oge', 'lon-e', 'xfl', 'hck', 'fnc', 'hbo', 'fog', 'stb', 'odi', 'mep-e', 'fcs', 'key-a', 'sxa', 'fcd', 'afy', 'snl', 'smw', 'fvr', 'fui', 'dnv', 'zeb', 'thj', 'zfy', 'mgg', 'ftm', 'soh', 'fve', 'zfn', 'sod-a-a', 'apc', 'mft', 'ftz', 'sat', 'zjz', 'cds', 'tfv', 'xaj-a', 'acs-h', 'std-a', 'tfa', 'fel-e', 'jmt', 'acs-i', 'sab', 'uni-d', 'osa', 'qzj', 'hwx', 'ttv-a', 'snz', 'sma', 'fvd', 'zfo', 'gar-a', 'soi', 'ftl', 'hyd', 'fuh', 'fvs', 'smv', 'snm', 'zfx', 'zec', 'fce', 'zsn', 'afw-a', 'afx', 'mst', 'dne-a', 'fcr', 'bcu-x-a', 'tph-a', 'stc', 'fof', 'odh', 'gis-e', 'tsa', 'xfm', 'fnb', 'hcj', 'baa-e', 'lon-e-a', 'xfz', 'svk', 'fnu', 'noo', 'gee', 'qtz-a', 'nox', 'phx', 'dnadnj-a', 'skl-a', 'ncl', 'wme', 'wla', 'urs-a', 'nas', 'wni', 'wmr', 'dmh-a', 'bbe', 'lom', 'fgl-a', 'edc', 'isp', 'bca', 'sra-e', 'lni', 'urh-a', 'bbr', 'pcl-e', 'aea-a', 'iab-a', 'isg', 'bcv', 'sqs-a', 'wxs', 'umg', 'pse', 'qne-a', 'pcu-g', 'nzn', 'uox', 'lcy', 'wut', 'uoo', 'lwh-a', 'rho', 'uni', 'umr', 'tsg-a', 'bow', 'uom', 'yyz', 'ume', 'ula', 'uoz', 'urq-a', 'bbp', 'bak', 'ucn', 'bct', 'ise', 'bbg', 'ctn-d', 'eee', 'isr', 'wmp', 'wnk', 'ghw', 'txt-a', 'gis', 'wlt', 'utp', 'tfu-a', 'wmg', 'rra', 'utg', 'wlc', 'noz', 'urj-a', 'lta-e', 'nom', 'qtz-t', 'twf-e', 'fnw', 'svi', 'sur', 'xfx', 'tst', 'ahq', 'ttt-a', 'odj', 'sta', 'hch', 'xfo', 'tsc', 'fcp', 'hmc', 'xlz-a', 'aea', 'sxb', 'fcg', 'zra-d', 'ftn', 'sok', 'tim', 'mgd', 'zea', 'zfz', 'sno', 'smt', 'fvq', 'fuj', 'fty', 'mdh', 'zfm', 'fvf', 'smc', 'snx', 'dng-a', 'osc', 'ten', 'tfu', 'cdp', 'qzh', 'tfb', 'cec', 'mjt', 'tfc', 'cdq', 'mjb', 'lcs-a', 'mer-e', 'kfi-e', 'fsn-a', 'teo', 'osb', 'che', 'ffg-a', 'zfl', 'sny', 'tru-a', 'smb', 'fvg', 'hyp', 'ftx', 'thh', 'pbg-e', 'dnt', 'fvp', 'smu', 'soj', 'fto', 'msw', 'fcf', 'sxc', 'hea-a', 'sxt', 'fcq', 'pek-a', 'fna', 'hci', 'tsb', 'xfn', 'oft', 'odk', 'svh', 'fnv', 'ubt-a', 'xfy', 'twf-d', 'stw', 'swl', 'nol', 'lwh', 'pek', 'byl', 'ahq-a', 'gie', 'epy-a', 'wlb', 'wmf', 'utq', 'wlu', 'wnj', 'wmq', 'phl', 'ctn-e', 'bcb', 'lnj', 'iss', 'iph', 'eed', 'dia-x', 'dia-a-a', 'bbf', 'lon', 'bcu', 'isd', 'icd-a', 'wyt', 'baj', 'bbq', 'kgt', 'umd', 'bne', 'uol', 'cag-e', 'lcz', 'ums', 'unh', 'uoa', 'kgn', 'une', 'uov', 'pcu-i', 'rht', 'ntw-a', 'unr', 'bcx', 'ucb', 'isi', 'ast-d', 'bag', 'lng', 'bco', 'epu-a', 'ssb-a', 'bap', 'bbk', 'wng', 'pha', 'utk', 'ncb', 'wnp', 'wmk', 'nov', 'hms-a', 'alc-a', 'qtz-x', 'sve', 'tsx', 'xft', 'hal', 'swa', 'hcd', 'fnl', 'kew-a', 'svr', 'tpt', 'xfc', 'odf', 'ssf-e', 'sxn', 'the', 'zfv', 'xad-a', 'dmb', 'dny', 'fuf', 'sog', 'jan', 'ftb', 'mds', 'zfa', 'pbp-e', 'smo', 'dnn', 'fvj', 'ftu', 'rho-e', 'sop', 'stj-a', 'qzd', 'tfy', 'oso', 'cfc', 'cdk', 'acs-g', 'ana-f', 'mhg', 'fsy-a', 'tfn', 'dnp-a', 'tfo', 'cdj', 'mjy', 'ttx-a', 'acs-f', 'mhq', 'hww', 'soq', 'flu-e', 'ftt', 'ths', 'asv', 'fvk', 'fup', 'dno', 'smn', 'ftc', 'fog-a', 'sof', 'zfw', 'snb', 'fug', 'dmc', 'afw', 'sxo', 'fcj', 'fau', 'hof', 'fab', 'cfc-e-a', 'odg', 'svs', 'xfb', 'tsn', 'svn-a', 'fnz', 'svd', 'xfu', 'tsy', 'ofo', 'pto-d', 'nnd', 'wmj', 'wnq', 'ncc', 'phw', 'wln', 'utj', 'tfb-a', 'wnf', 'ifi', 'bbj', 'baq', 'nvb', 'bcn', 'myd-a', 'irl', 'tfy-a', 'baf', 'bcy', 'ast-e', 'ish', 'uns', 'umh', 'uow', 'ull', 'pcu-h', 'und', 'lcv', 'urf-a', 'pry', 'pcb-e', 'kgo', 'wuy', 'unf', 'kfi', 'gra', 'uob', 'umj', 'mot-e-a', 'kgz', 'uou', 'tim-a', 'lll', 'nth', 'isj', 'dia-a', 'dmd-a', 'bas', 'bbh', 'bcl', 'wnd', 'ofp-a', 'ybh', 'cor-a', 'rtl-a', 'nca', 'wns', 'wmh', 'wll', 'ltj', 'nou', 'nob', 'pbz-m', 'dtc', 'phw-a', 'swb', 'sty', 'xfw', 'svf', 'fnx', 'hcp', 'crr', 'tsl', 'fno', 'svq', 'zst', 'fch', 'dnr-a', 'sxm', 'sod', 'fwz', 'fta', 'dma', 'dni-a', 'fue', 'zfu', 'thf', 'asc', 'ftv', 'sos', 'sml', 'snw', 'dnm', 'fur', 'fvi', 'apo', 'ast', 'zfb', 'tfz', 'scg', 'hxg-d', 'zim', 'tfm', 'ana-e', 'cdh', 'cgs', 'het-a', 'ssd-e', 'mjz', 'lvt-a', 'tfl', 'mcf-d', 'ffi-a', 'tdd', 'fvh', 'fus', 'snv', 'smm', 'thp', 'zfc', 'mdf', 'sor', 'ftw', 'doh', 'fud', 'zft', 'ptr-a', 'jbw', 'soe', 'qtz', 'sxl', 'fci', 'pti-a', 'msx', 'ken-a', 'npo-e', 'xfa', 'svp', 'fnn', 'sto', 'viv', 'crs', 'uta-e', 'fny', 'svg', 'stx', 'swc', 'llk-a', 'pbz-l', 'hms-t', 'epw-a', 'gdm', 'noc', 'lwg', 'qdl-e', 'lvt', 'xbk-a', 'rhr-a', 'not', 'nbd', 'wlm', 'ntu-a', 'wmi', 'wnr', 'lcy-e', 'wne', 'nva', 'bcm', 'ala-a', 'bbi', 'bar', 'isk', 'bcz', 'nti', 'uot', 'bon', 'xbp-a', 'unp', 'umk', 'kgl', 'uoc', 'bor-e', 'ung', 'ydq-a', 'rhr', 'umo', 'unt', 'neb-e', 'bnn', 'uop', 'kdd', 'kew', 'xab-a', 'unc', 'umx', 'bor-a', 'uog', 'kgh', 'bav', 'bbm', 'eft', 'nbo-a-e', 'isx', 'bbz', 'baa', 'iso', 'ncd', 'bvh', 'wnv', 'wmm', 'utm', 'wli', 'wmz', 'wna', 'epy', 'lcy-a', 'utz', 'rht-x-a', 'yav', 'ylf', 'nog', 'pds', 'stw-a', 'gez', 'flu', 'fon', 'fnj', 'svt', 'tpr', 'xfe', 'tsi', 'css', 'swg', 'svc', 'jxj', 'xfr', 'fcm', 'sxh', 'hlz', 'eea-a', 'npo-a', 'fcz', 'zyg-a', 'xij', 'vdb', 'fts', 'mef', 'zfg', 'smi', 'snr', 'ftw-a', 'fuw', 'fvl', 'ftd', 'zfp', 'mfj', 'asf', 'dni-d', 'dmd', 'sne', 'scu', 'tfh', 'gdm-a', 'tes', 'cdm', 'acs-a', 'sbq', 'zhl', 'cfe', 'cdz', 'dag', 'cfd', 'tee', 'cdl', 'ana-a', 'mgc-a', 'sct', 'tfi', 'mep', 'zfq', 'fua', 'dme', 'fte', 'zff', 'asp', 'fvm', 'dni', 'fuv', 'smh', 'sow', 'ftr', 'xik', 'zrc', 'sxi', 'lcy-a-x', 'fcl', 'qnf', 'svb', 'tpd', 'xfs', 'swf', 'gwe-a', 'svu', 'fnk', 'tsh', 'xfd', 'mgz-x-d', 'stj', 'noq', 'qtz-h', 'bel-e', 'nof', 'lcw-x', 'cdz-e', 'idp', 'cor-e', 'dnm-a', 'btv', 'naz', 'nba', 'wlh', 'wml', 'wnw', 'nce', 'est', 'ctn-x', 'isn', 'los', 'isy', 'bbl', 'baw', 'lbt', 'uof', 'gme-e', 'kgi', 'tcf-x', 'umy', 'unb', 'uoq', 'ulj', 'kea', 'unu', 'umn', 'ulh', 'uos', 'uml', 'unw', 'kgk', 'uod', 'yzh', 'dno-a', 'nbo-x-d', 'ket', 'ucp', 'yug', 'sdt-e', 'bcj', 'bau', 'bbn', 'ubt', 'isl', 'edq-a', 'dia-g', 'ast-a', 'bby', 'llj', 'ntn', 'bab', 'srs-a-e', 'utn', 'wlj', 'ncg', 'wnu', 'wmn', 'uty', 'epz', 'wmy', 'wnb', 'dnt-a', 'pdp', 'nod', 'shp-a', 'nos', 'ltl', 'xff', 'tsj', 'csp', 'fni', 'svw', 'sth', 'tst-a', 'crt', 'xfq', 'fla', 'hbr', 'swd', 'cot-a', 'agw', 'zra', 'fcn', 'sxk', 'vfi', 'xii', 'vda', 'fcy', 'msh', 'smj', 'snq', 'fut', 'fvo', 'dmp', 'zfd', 'mgz', 'ftp', 'flu-a', 'sln', 'sou', 'fvx', 'dmg', 'snf', 'zfs', 'mer', 'ftg', 'gwg-a', 'sbr', 'ury-a', 'cdn', 'tfk', 'soc-a', 'qza', 'tdc', 'sca', 'cgc', 'tfj', 'tie', 'jaj', 'ftf', 'soc', 'sng', 'fub', 'dmf', 'csq-a', 'ftee-a', 'pbg-l', 'zfr', 'zga', 'sot', 'ftq', 'xbv-a', 'fvn', 'dnj', 'fuu', 'snp', 'smk', 'zfe', 'med', 'fcx', 'msi', 'rht-a', 'sxj', 'swe', 'dtd', 'crb', 'xfp', 'qne', 'sva', 'csq', 'xfg', 'nts-a', 'svn-d', 'svv', 'sum', 'fnh', 'hms-e', 'nor', 'noe', 'wal', 'pto-a', 'geo', 'ksx', 'wnc', 'wmx', 'utx', 'wmo', 'wnt', 'ncf', 'btu', 'wlk', 'uto', 'bac', 'bbx', 'llk', 'stu-a', 'wyt-a', 'dia-f', 'ism', 'bbo', 'bat', 'edi', 'pcu-h-e', 'bck', 'umz', 'una', 'lcs', 'ptt-a', 'kgj', 'uoe', 'bnl', 'unv', 'umm', 'rhp', 'keb', 'pcu-m', 'uli', 'ele', 'bix', 'thp-x', 'gea-a', 'ukj', 'iac-d-a', 'eab', 'ecj', 'bel', 'etc', 'tcc-x', 'whd', 'yfw', 'wiw', 'urh', 'wjl', 'whs', 'wkh', 'bpm', 'bsv', 'wfx', 'srs-a', 'lrj', 'gez-a', 'wfo', 'srd-l', 'dns-a', 'wgk', 'vnd', 'cua', 'ala', 'xas', 'xbh', 'vmh', 'ncb-a', 'xad', 'vbg', 'mtn-e-a', 'ffd', 'vcc', 'vbp', 'oob', 'czz-a', 'vct', 'fse', 'mct', 'zbj', 'edi-e', 'skl', 'tfa-a', 'ddy', 'tsh-a', 'tca', 'zni', 'lwg-a', 'mog-e', 'sdt', 'cas', 'tbr', 'zme', 'sod-a', 'zmd', 'sgn', 'eed-a', 'moa', 'fjh-a', 'test-', 'tbd', 'tot', 'xbq-a', 'toc', 'zbk', 'sie', 'fsd', 'bcu-t', 'hha', 'vcu', 'ssa-a', 'vbq', 'epv-a', 'muo', 'vcb', 'vbf', 'fga', 'sod-a-a-a', 'alw', 'xae', 'zzz', 'cla-d', 'vmi', 'nju-a', 'anh', 'hef', 'ntt-a', 'sqc', 'xbi', 'xar', 'vne', 'wgj', 'pth-a', 'wfn', 'eye', 'lrk', 'wfy', 'icf', 'bpl', 'wki', 'yfa', 'whr', 'wjm', 'uri', 'wiv', 'ibb', 'spl-a', 'xag-a', 'the-a', 'whe', 'yfv', 'upa', 'wia', 'wjz', 'etb', 'sse-e', 'eck', 'bem', 'wys-a', 'ukk', 'pts-a', 'wsx', 'rna', 'eld', 'rnc', 'wpa', 'icm-e', 'elf', 'mgz-x-d-e', 'uki', 'spn-a', 'pto', 'sab-a', 'kem-a', 'eci', 'kna', 'upc', 'yft', 'whg', 'nia-a', 'wjx', 'whp', 'yfc', 'wkk', 'bpn', 'neb', 'icd', 'wit', 'urk', 'wjo', 'stp-a', 'kto', 'nia', 'wgh', 'gat', 'wfl', 'ffj-a', 'lsz', 'alb', 'xap', 'xbk', 'kms-a', 'ocu', 'xag', 'hed', 'tot-a', 'vmk', 'ffg', 'vbd', 'vcw', 'vbs', 'alb-a', 'ooa', 'sig', 'zbi', 'cmd', 'ood-a', 'fsf', 'ats', 'ntv-a', 'tcb', 'cag', 'mot', 'dei', 'sod-b', 'cco', 'act-a', 'moc', 'ssc-a', 'cap', 'mob', 'scu-h', 'deh', 'mgd-a', 'mou', 'sda', 'sod-t', 'tcc', 'crb-e', 'tsq-a', 'fsg', 'sky', 'shb', 'cla', 'urg-a', 'zbh', 'vbr', 'vcv', 'hhb', 'vbe', 'tsj-a', 'vca', 'dmg-a', 'fff', 'hee', 'vmj', 'cut', 'xaf', 'vnf', 'xbj', 'twf', 'xaq', 'alc', 'ucp-a', 'srs-t', 'dag-a', 'wfm', 'wgi', 'wfz', 'rtl', 'wjn', 'urj', 'wiu', 'bpo', 'wkj', 'whq', 'yfb', 'eta', 'fdu-a', 'wib', 'wjy', 'yfu', 'whf', 'upb', 'dnj-a', 'ben', 'ech', 'dnq-a', 'lib', 'ukh', 'fof-a', 'rnb', 'lfm', 'lev', 'ukl', 'flt-a', 'tfg-a', 'pts-f', 'rob', 'elc', 'urx-a', 'sqc-a', 'ecl', 'lhb', 'ith', 'tsn-a', 'yff', 'whu', 'wkn', 'bpk', 'urn', 'wiq', 'pdp-a', 'nia-d', 'icv', 'gme', 'wky', 'whb', 'yfq', 'ury', 'wif', 'ete', 'pbp', 'wgm', 'wfi', 'gaf', 'pbg', 'wgz', 'lrl', 'srs-g', 'sqs', 'xby', 'xab', 'tfc-e', 'ctt', 'vmn', 'hea', 'xau', 'tty', 'xbn', 'vnb', 'vmy', 'vcr', 'vbv', 'alb-d', 'qdl-e-a', 'ffb', 'ibd-a', 'vce', 'muh', 'vba', 'maz', 'coi', 'atv', 'crb-a', 'twt-e', 'fst', 'jec', 'zbl', 'shf', 'fsc', 'dnn-a', 'zmc', 'sod-g', 'sev', 'cbn', 'mof', 'mmn', 'tcg', 'dft-e', 'sea', 'cab', 'tcf', 'mmo', 'ddi', 'cat', 'mog', 'cbo', 'zmb', 'sod-f', 'fsb', 'tna', 'xxv', 'zbm', 'shp', 'jeb', 'mcd', 'zbz', 'sit', 'eft-a', 'ffc', 'vcd', 'ooe', 'tzs', 'vbw', 'vcs', 'vmx', 'vnc', 'xbo', 'ttx', 'xat', 'zxc', 'vmo', 'oge-a', 'kna-a', 'ssm', 'xac', 'zyg', 'twt', 'srs-f', 'pcb', 'nir', 'wfh', 'pcu', 'wgl', 'wig', 'urx', 'etd', 'nia-e', 'whc', 'wkx', 'wjk', 'wip', 'ibd', 'ets', 'bpj', 'wko', 'yfg', 'wht', 'can-e', 'xmz-a', 'ecm', 'lig', 'elb', 'etn-a', 'pts-g', 'ukm', 'pti', 'bcs-a', 'uko', 'roa', 'pts-e', 'ukx', 'gdl-a', 'eco', 'agw-a', 'etq', 'urm', 'wir', 'wji', 'whv', 'yfe', 'wkm', 'bph', 'xbu-a', 'etf', 'urz', 'wzz-a', 'wie', 'yfr', 'wha', 'wfj', 'wgn', 'sbr-a', 'gar', 'wgy', 'nip', 'cus', 'vmm', 'xbz', 'xaa', 'sqp', 'vna', 'vmz', 'xci', 'xav', 'ttz', 'xbm', 'vbu', 'vcq', 'nbo-a', 'hjm', 'tfz-d', 'vbb', 'vcf', 'ffa', 'jsm', 'mcf', 'bcu-g', 'qsm', 'zbx', 'she', 'mtn-e', 'vtx', 'viv-a', 'dgp', 'dft', 'tcs', 'seb', 'zra-d-e', 'mmm', 'tcd', 'tce', 'mml', 'sod-e', 'zma', 'znz', 'mod', 'dgq', 'tof', 'zbn', 'cmc', 'fsa', 'tcg-x', 'ott-a', 'zby', 'cys-a', 'att', 'mbc', 'bcu-f', 'qsl', 'skh', 'jea', 'vcg', 'vbc', 'aab', 'mtn', 'vcp', 'omy', 'fel', 'oof', 'vbt', 'xbl', 'xaw', 'het', 'gee-a', 'hec', 'vml', 'pzh-a', 'niq', 'wgx', 'ins', 'wgo', 'wfk', 'yfs', 'upd', 'ict', 'etg', 'wid', 'bpi', 'wkl', 'whw', 'yfd', 'icc', 'etp', 'tfe-a', 'wjh', 'wis', 'url', 'jaj-a', 'pyo', 'myc-a', 'eye-a', 'ecn', 'pbz-l-l', 'uky', 'ela', 'ukn', 'bkt', 'bik', 'pth', 'elv'), match='auto', skip_non_fitting_if_possible=True)[source]#

MOFs are assembled from building blocks on a net.

The “ideal” vertex “structures” of the net can fit better or worse with the “shape” of the actual building blocks. This featurizer attempts to quantify this mismatch.

Note

The edge match values will all be quite close to zero and hence not that meaningful (two points always form a line, there is not much room for mismatch unless the length of the line [which is ignored by default with allow_rescale=True]…). In pratice, you should consider treating them seperate from the vertex match values.

Examples

>>> from mofdscribe.bu import BUMatch
>>> from pymatgen.core import Structure
>>> s = Structure.from_file("tests/test_files/bu_test_1.cif")
>>> bu_match = BUMatch(topos=["tbo", "pcu"], aggregations=["mean", "min"])
>>> bu_match.featurize(s)

Create a new BUMatch featurizer.

Parameters:
  • allow_rescale (bool) – If True, allow to multiple coordinates of structure with scalar to better match the reference structure. Defaults to True.

  • mismatch_fill_value (float) – Value use to fill entries for which the RMSD computation cannot be perform due to a mismatch in coordination numbers. Defaults to 1_000.

  • return_only_best (bool) – If True, do not compute statistics but only return the minimum RMSD. Defaults to True.

  • aggregations (Tuple[str]) – Functions to use to aggregate RMSD of the different possible positions. Defaults to (“max”, “min”, “mean”, “std”).

  • topos (Tuple[str]) – RCSR codes to consider for matching. Defaults to ALL_AVAILABLE_TOPOS.

  • match (str) – BB to consider for matching. Must be one of “auto”, “edge” or “node”. If the mode is “auto” it assumes that the number of sites in the input building block is equal to the number of connection vertices. Hence, if there are more than two sites, it will match the node. If there are only two sites, it will match the edge. Defaults to “auto”.

  • skip_non_fitting_if_possible (bool) – If True, do not compute RMSD for non-compatible BBs. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

featurize(s)[source]#

Structure is here spanned by the connecting points of a BU.

Return type:

ndarray

citations()[source]#

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

Use RDkit featurizers on pymatgen molecules.

class RDKitAdaptor(featurizer, feature_labels, local_env_strategy='vesta', force_sanitize=True)[source]#

Use any featurizer that can operate on RDkit molecules on pymatgen molecules.

For this, we convert the pymatgen molecule to an RDkit molecule, using the coordinates of the pymatgen molecule as the coordinates of a conformer.

Constuct a new RDKitAdaptor.

Parameters:
  • featurizer (Callable) – Function that takes an RDKit molecule and returns some features (int, float, or list or array of them).

  • feature_labels (Collection[str]) – Names of features. Must be the same length as the number of features returned by the featurizer.

  • local_env_strategy (str) – If the featurize method is called with a Molecule object, this determines the local environment strategy to use to convert the molecule to a MoleculeGraph. Defaults to “vesta”.

  • force_sanitize (bool) – If True, the RDKit molecule will be sanitized

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

featurize(molecule)[source]#

Call the RDKit featurizer on the molecule.

If the input molecule is a Molecule, we convert it to a MoleculeGraph using the local environment strategy specified in the constructor.

Parameters:

molecule (Union[Molecule, MoleculeGraph]) – A pymatgen Molecule or MoleculeGraph object.

Return type:

ndarray

Returns:

A numpy array of features.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

Compute features on the BUs and then aggregate them.

class MOFBBs(**data)[source]#

Container for MOF building blocks.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

class BUFeaturizer(featurizer, aggregations=('mean', 'std', 'min', 'max'))[source]#

Compute features on the BUs and then aggregate them.

This can be useful if you want to compute some features on the BUs and them aggregrate them to obtain one fixed-length feature vector for the MOF.

Warning

Note that, currently, not all featurizers can operate on both Structures and Molecules. If you want to include featurizers that can only operate on one type (e.g. RDKitAdaptor and AMD) then you need to create two separate MOFBBs and BUFeaturizer objects.

Examples

>>> from mofdscribe.bu import BUFeaturizer, MOFBBs
>>> from mofdscribe.bu.rdkitadaptor import RDKitAdaptor
>>> from rdkit.Chem.Descriptors3D import Asphericity
>>> from pymatgen.core import Molecule
>> from pymatgen.io.babel import BabelMolAdaptor
>>> base_featurizer = RDKitAdaptor(featurizer=Asphericity, feature_labels=["asphericity"])
>>> bu_featurizer = BUFeaturizer(featurizer=base_featurizer, aggregations=("mean", "std", "min", "max"))
>>> bu_featurizer.featurize(
        mofbbs=MOFBBs(nodes=[BabelMolAdaptor.from_string(
            "[CH-]1C=CC=C1.[CH-]1C=CC=C1.[Fe+2]", "smi").pymatgen_mol],
        linkers=[BabelMolAdaptor.from_string("CCCC", "smi").pymatgen_mol]))

Construct a new BUFeaturizer.

Parameters:
  • featurizer (BaseFeaturizer) – The featurizer to use. Currently, we do not support `MultipleFeaturizer`s. Please, instead, use multiple BUFeaturizers. If you use a featurizer that is not implemented in mofdscribe (e.g. a matminer featurizer), you need to wrap using a method that describes on which data objects the featurizer can operate on. If you do not do this, we default to assuming that it operates on structures.

  • aggregations (Tuple[str]) – The aggregations to use. Must be one of ARRAY_AGGREGATORS.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

fit(structures=None, mofbbs=None)[source]#

Fit the featurizer to the given structures.

Parameters:
  • structures (Collection[Structure], optional) – The structures to featurize.

  • mofbbs (Collection[MOFBBs], optional) – The MOF fragments (nodes and linkers).

Return type:

None

featurize(structure=None, mofbbs=None)[source]#

Compute features on the BUs and then aggregate them.

If you provide a structure, we will fragment the MOF into BUs. If you already have precomputed fragements or only want to consider a subset of the BUs, you can provide them manually via the mofbbs argument.

If you manually provide the mofbbs, we will convert molecules to structures where possible.

Parameters:
  • structure (Union[Structure, IStructure], optional) – The structure to featurize.

  • mofbbs (MOFBBs, optional) – The MOF fragments (nodes and linkers).

Return type:

ndarray

Returns:

A numpy array of features.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, mofbbs: bool | None | str = '$UNCHANGED$', structures: bool | None | str = '$UNCHANGED$') BUFeaturizer#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • mofbbs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for mofbbs parameter in fit.

  • structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Compute local structure order parameters for a fragment.

class LSOP(types=('cn', 'tet', 'oct', 'bcc', 'sq_pyr', 'sq_pyr_legacy', 'tri_bipyr', 'sq_bipyr', 'oct_legacy', 'tri_plan', 'sq_plan', 'pent_plan', 'tri_pyr', 'pent_pyr', 'hex_pyr', 'pent_bipyr', 'hex_bipyr', 'T', 'cuboct', 'oct_max', 'tet_max', 'tri_plan_max', 'sq_plan_max', 'pent_plan_max', 'cuboct_max', 'bent', 'see_saw_rect', 'hex_plan_max', 'sq_face_cap_trig_pris'), parameters=None)[source]#

Compute shape parameters for a fragment.

The fragments can be a molecule or a molecule that only contains important part (e.g. binding sites) of a molecule. The shape parameters are then supposed to quantify the shape of the fragment. For instance, a triangular molecule will have a tri_plan parameter close to 1.

While there is a site-based LSOP featurizer in matminer there is none that uses LSOP to quantify the shape of some fragment. This featurizers just does that.

It does so by placing a dummy site at the center of mass of the fragment and then computes the LSOP considering all other sites as neighbors.

Initialize the featurizer.

Parameters:
  • types (Tuple[str]) – The types of LSOP to compute. For the full list of types see: pymatgen.analysis.local_env.LocalStructOrderParams. __supported_types. Defaults to: [“tet”, “oct”, “bcc”, “sq_pyr”, “sq_pyr_legacy”, “tri_bipyr”, “sq_bipyr”, “oct_legacy”, “tri_plan”, “sq_plan”, “pent_plan”, “tri_pyr”, “pent_pyr”, “hex_pyr”, “pent_bipyr”, “hex_bipyr”, “T”, “cuboct”, “oct_max”, “tet_max”, “tri_plan_max”, “sq_plan_max”, “pent_plan_max”, “cuboct_max”, “bent”, “see_saw_rect”, “hex_plan_max”, “sq_face_cap_trig_pris”]

  • parameters (List[dict], optional) – The parameters to pass to the LocalStructOrderParams object.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

featurize(s)[source]#

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Parameters:

x – input data to featurize (type depends on featurizer).

Return type:

ndarray

Returns:

(list) one or more features.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

Code taken from the SI for 10.1021/acs.jcim.6b00565

class NConf20[source]#

Compute the nConf20 descriptor for a molecule.

This descriptor attempts to capture the flexibility of molecules by sampling the conformational space. The descriptor is a count of the “accessible” conformers (based on relative conformer energies up to 20 kcal/mol, the lowest energy conformer is not counted). Conformers are generated using the RDKit conformer generator.

… warning::

Part of the featurization is a geometry optimization using the MMFF force field. This will naturally fail for some molecules, and for all metal clusters. In these cases, the descriptor will be set to NaN.

Construct a new nConf20 featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

Describe the chemical composition of structures.

class CompositionStats(encodings=('mod_pettifor', 'X'), aggregations=('mean', 'std', 'max', 'min'))[source]#

Describe the composition of molecules by computing statistics of their compositions.

The featurizer will encode the element on all sites in the structure using user-defined encodings. Then it aggregates those encodings using user-defined encodings (e.g. min, max, min).

Initialize a CompositionStats featurizer.

Parameters:
  • encodings (Tuple[str]) – Encoding used for the elements. Can be one of element_coder.data.coding_data._PROPERTY_KEYS. Defaults to (“mod_pettifor”, “X”).

  • aggregations (Tuple[str]) – Statistic to compute over the element encodings. Can be one of mofdscribe.featurizers.utils.aggregators.ARRAY_AGGREGATORS. Defaults to (“mean”, “std”, “max”, “min”).

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

featurize(molecule)[source]#

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Parameters:

x – input data to featurize (type depends on featurizer).

Return type:

ndarray

Returns:

(list) one or more features.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

Describe molecules by computing statistics of pairwise distances between their atoms.

class PairwiseDistanceStats(aggregations=('mean', 'std', 'max', 'min'))[source]#

Describe the shape of molecules by computing statistics of pairwise distances.

For doing so, we will just compute all pairwise distances and then compute some statistics on them. One might also think of this as pretty rough approximation of something like the AMD fingerpint.

Create a new PairwiseDistanceStats featurizer.

Parameters:

aggregations (Tuple[str], optional) – Aggregations to compute over the pairwise distances. Must be one of ARRAY_AGGREGATORS. Defaults to (“mean”, “std”, “max”, “min”).

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

featurize(structure)[source]#

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Parameters:

x – input data to featurize (type depends on featurizer).

Return type:

ndarray

Returns:

(list) one or more features.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

Describe molecules by computing a histogram of pairwise distances between their atoms.

class PairwiseDistanceHist(lower_bound=0.0, upper_bound=15.0, bin_size=0.5, density=True)[source]#

Describe the shape of molecules by computing a histogram of pairwise distances.

For doing so, we will just compute all pairwise distances and then compute the histogram of them. One might also think of this as pretty rough approximation of something like the AMD fingerpint

It also has some similarities to the “Grouped representation of interatomic distances” reported in [Zhang2022].

Create a new PairwiseDistanceHist featurizer.

Parameters:
  • lower_bound (float) – Lower bound of the histogram. Defaults to 0.0.

  • upper_bound (float) – Upper bound of the histogram. Defaults to 15.0.

  • bin_size (float) – Size of the bins. Defaults to 0.5.

  • density (bool) – Whether to return the density or the counts. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

featurize(structure)[source]#

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Parameters:

x – input data to featurize (type depends on featurizer).

Return type:

ndarray

Returns:

(list) one or more features.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

Implement some shape featurizers from RDKit using the RDKitAdaptor.

class Asphericity[source]#

Featurizer for the RDKit Asphericity descriptor.

Construct a new Asphericity featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

class Eccentricity[source]#

Featurizer for the RDKit Eccentricity descriptor.

Construct a new Eccentricity featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

class InertialShapeFactor[source]#

Featurizer for the RDKit InertialShapeFactor descriptor.

Construct a new InertialShapeFactor featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

class NPR1[source]#

Featurizer for the RDKit NPR1 descriptor.

Construct a new NPR1 featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

class NPR2[source]#

Featurizer for the RDKit NPR2 descriptor.

Construct a new NPR2 featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

class PMI1[source]#

Featurizer for the RDKit PMI1 descriptor.

Construct a new PMI1 featurizer.

class PMI2[source]#

Featurizer for the RDKit PMI2 descriptor.

Construct a new PMI2 featurizer.

class PMI3[source]#

Featurizer for the RDKit PMI3 descriptor.

Construct a new PMI3 featurizer.

class RadiusOfGyration[source]#

Featurizer for the RDKit RadiusOfGyration descriptor.

Construct a new RadiusOfGyration featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

class SpherocityIndex[source]#

Featurizer for the RDKit Spherocity Index descriptor.

Construct a new SpherocityIndex featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

class RodLikeness[source]#

Featurizer for the RDKit Rod Likeness descriptor.

This descriptor is computed as NPR2 - NPR1.

Construct a new RodLikeness featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

class DiskLikeness[source]#

Featurizer for the RDKit Disk Likeness descriptor.

This descriptor is computed as 2 - 2 * NPR2

Construct a new DiskLikeness featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

class SphereLikeness[source]#

Featurizer for the RDKit Sphere Likeness descriptor.

This descriptor is computed as NPR1+NPR2-1.

Construct a new SphereLikeness featurizer.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

Featurize a molecule using SMARTS matches.

number_smart_matches(mol, smarts)[source]#

Count the number of SMARTS matches in a molecule.

This can be useful if we have some prior knowledge about which substructures might be interesting/relevant.

Parameters:
  • mol (rdkit.Chem.rdchem.Mol) – RDKit molecule.

  • smarts (Collection[str]) – SMARTS patterns to match.

Returns:

Number of SMARTS matches.

Return type:

int

class SmartsMatchCounter(smarts, feature_labels)[source]#

Count the number of SMARTS matches in a molecule.

This can be useful if we have some prior knowledge about which substructures might be interesting/relevant. For instance, you might want to count the number of carboxylic acid groups in a molecule.

Construct a new SmartsMatchCounter.

Parameters:
  • smarts (Collection[str]) – SMARTS patterns to match.

  • feature_labels (str, optional) – Feature labels. If None, the SMARTS patterns are concatenated to a labels.

class AcidGroupCounter[source]#

Count the number of acidic groups in a molecule.

SMARTS patterns are taken from the Mordred package.

Construct a new AcidGroupCounter.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class BaseGroupCounter[source]#

Count the number of basic groups in a molecule.

SMARTS pattern taken from the Mordred package

Construct a new BaseGroupCounter.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

Host Guest featurization#

Compute features on the host and the guests and then aggregate them.

class HostGuestFeaturizer(featurizer, aggregations=('mean', 'std', 'min', 'max'), local_env_method='vesta', remove_guests=True)[source]#

Compoute features on the host and the guests and then aggregate them.

This can be useful if you use structures with adsorbed guest molecules as input and want to compute adsorption properties such as adsorption energies.

Warning

Note that we assume that the guest is not covalently bonded to the host.

Note

If there is only one guest, then there is no need to aggregate the features. In this case, only use the “mean” aggregation.

Construct a new BUFeaturizer.

Parameters:
  • featurizer (BaseFeaturizer) – The featurizer to use. Currently, we do not support `MultipleFeaturizer`s. Please, instead, use multiple BUFeaturizers. If you use a featurizer that is not implemented in mofdscribe (e.g. a matminer featurizer), you need to wrap using a method that describes on which data objects the featurizer can operate on. If you do not do this, we default to assuming that it operates on structures.

  • aggregations (Tuple[str]) – The aggregations to use. Must be one of ARRAY_AGGREGATORS.

  • local_env_method (str) – The method to use for the local environment determination (to compute the structure graph). Defaults to “vesta”.

  • remove_guests (bool) – Whether to remove the guests from the structure. This is useful if you want to compute features on the host only, independent of the guests. Defaults to True.

feature_labels()[source]#

Generate attribute names.

Return type:

List[str]

Returns:

([str]) attribute labels.

fit(structures, host_guests=None)[source]#

Fit the featurizer to the given structures.

Parameters:
  • structures (Collection[Union[Structure, IStructure]]) – The structures to fit to.

  • host_guests (Optional[Collection[HostGuest]]) – The host_guests to fit to. If you provide this, you must not provide structures.

Return type:

None

featurize(structure, host_guest=None)[source]#

Compute the features of the host and the guests and aggregate them.

Parameters:
  • structure (Optional[Union[Structure, IStructure]]) – The structure to featurize.

  • host_guest (Optional[HostGuest]) – The host_guest to featurize. If you provide this, you must not provide structure.

Returns:

The features of the host and the guests.

Return type:

np.ndarray

Raises:

ValueError – If we cannot detect a host.

citations()[source]#

Citation(s) and reference(s) for this feature.

Return type:

List[str]

Returns:

(list) each element should be a string citation,

ideally in BibTeX format.

implementors()[source]#

List of implementors of the feature.

Return type:

List[str]

Returns:

(list) each element should either be a string with author name (e.g.,

”Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

set_fit_request(*, host_guests: bool | None | str = '$UNCHANGED$', structures: bool | None | str = '$UNCHANGED$') HostGuestFeaturizer#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • host_guests (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for host_guests parameter in fit.

  • structures (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for structures parameter in fit.

Returns:

self – The updated object.

Return type:

object

Text description#