Default XGBoost with default features#

Model card#

Feature set description#

What features are used?#

We use all features that are available in the dataset, this includes:

  • statistics of persistence diagrams

  • histograms of persistence diagrams

  • atomic-property-labled RDF

  • average minimum distance

  • geometric properties (surface area, density)

This feature set is high-dimensional (2387) features.

Why are the features informative?#

One would expect that the features are redundant, but informative. From chemical insight, key properties for the Henry coefficient are the porosity and the chemistry. Both scopes are captured with the feature set.

However, the default set also includes the unit cell volume, which does not have a relation with the Henry coefficient.

Why do the features not leak information?#

We do not expect direct leakage (if the energy grid histogram or even the Henry coefficient were used as features this would be a clear case of leakage).

Will the features be accessible in real-world test applications?#

All features can be computed from the crystal structure.

Data split#

Describe preprocessing steps and how data leakage is avoided#

We did not preprocess the data.

Describe the feature selection steps and how data leakage is avoided#

We did not perform feature selection.

Describe the model selection steps and how data leakage is avoided#

We did not perform model selection.

Metric card#

Regression Metrics: default-xgboost-default-feat RdedemHZA20220809154048 ../../../_images/arrow-right-circle.svg
start_time: 2022-08-09 15:40:48.800836+00:00
end_time: 2022-08-09 15:42:08.461865+00:00
version: v0.0.1
features: default featurset of dataset
name: default-xgboost-default-feat
task: BenchTaskEnum.logKH_CO2_id
model_type: None
reference: None
implementation: mofdscribe
mofdscribe_version: 0.0.1-dev
mean_squared_error: 0.85
mean_absolute_error: 0.7
r2_score: 0.22
max_error: 3.46
mean_absolute_percentage_error: 0.39
top_5_in_top_5: 0
top_10_in_top_10: 0
top_50_in_top_50: 0
top_100_in_top_100: 0
top_500_in_top_500: 1
session_info: {'packages': {'alabaster': '0.7.12', 'alembic': '1.8.1', 'ansicolors': '1.1.8', 'anyio': '3.5.0', 'appdirs': '1.4.4', 'appnope': '0.1.3', 'argon2-cffi': '21.3.0', 'argon2-cffi-bindings': '21.2.0', 'ase': '3.22.1', 'asttokens': '2.0.5', 'attrs': '22.1.0', 'autopage': '0.5.1', 'average-minimum-distance': '1.3.0', 'Babel': '2.9.1', 'backcall': '0.2.0', 'backports.cached-property': '1.0.2', 'bandit': '1.7.4', 'beautifulsoup4': '4.11.1', 'black': '22.6.0', 'bleach': '5.0.1', 'bokeh': '2.4.3', 'brotlipy': '0.7.0', 'bump2version': '1.0.1', 'bumpversion': '0.6.0', 'catboost': '1.0.6', 'certifi': '2022.6.15', 'cffi': '1.15.1', 'cfgv': '3.3.1', 'charset-normalizer': '2.1.0', 'click': '8.1.3', 'cliff': '3.10.1', 'cloudpickle': '2.1.0', 'cmaes': '0.8.2', 'cmd2': '2.4.2', 'colorcet': '3.0.0', 'colorlog': '6.6.0', 'cryptography': '37.0.1', 'cycler': '0.11.0', 'darglint': '1.8.1', 'debugpy': '1.6.2', 'decorator': '5.1.1', 'deepchem': '2.6.1.dev20220119163852', 'defusedxml': '0.7.1', 'dgl': '0.9.0', 'dill': '0.3.5.1', 'diode': '1.0.1', 'dionysus': '2.0.8', 'distlib': '0.3.5', 'docutils': '0.19', 'dscribe': '1.2.2', 'element-coder': '0.0.5', 'entrypoints': '0.4', 'esbonio': '0.14.0', 'et-xmlfile': '1.1.0', 'executing': '0.9.1', 'fastjsonschema': '2.16.1', 'filelock': '3.7.1', 'flake8': '4.0.1', 'flake8-bandit': '3.0.0', 'flake8-black': '0.3.3', 'flake8-bugbear': '22.7.1', 'flake8-colors': '0.1.9', 'flake8-docstrings': '1.6.0', 'flake8-isort': '4.2.0', 'flake8-polyfill': '1.0.2', 'flake8-print': '5.0.0', 'fonttools': '4.34.4', 'furo': '2022.6.21', 'future': '0.18.2', 'gitdb': '4.0.9', 'GitPython': '3.1.27', 'graphviz': '0.20.1', 'greenlet': '1.1.2', 'h5py': '3.7.0', 'holoviews': '1.15.0', 'hpsklearn': '1.0.3', 'hyperopt': '0.2.7', 'identify': '2.5.2', 'idna': '3.3', 'imagesize': '1.4.1', 'importlib-metadata': '4.12.0', 'importlib-resources': '5.9.0', 'iniconfig': '1.1.1', 'ipykernel': '6.15.1', 'ipython': '8.4.0', 'ipython-genutils': '0.2.0', 'ipywidgets': '7.7.1', 'isort': '4.3.21', 'jedi': '0.18.1', 'jellyfish': '0.9.0', 'Jinja2': '3.1.2', 'joblib': '1.1.0', 'json5': '0.9.6', 'jsonpickle': '2.2.0', 'jsonpointer': '2.3', 'jsonschema': '3.2.0', 'jupyter-client': '7.3.4', 'jupyter-core': '4.11.1', 'jupyter-server': '1.18.1', 'jupyterlab': '3.4.4', 'jupyterlab-pygments': '0.2.2', 'jupyterlab-server': '2.12.0', 'jupyterlab-widgets': '1.1.1', 'kiwisolver': '1.4.4', 'latexcodec': '2.0.1', 'lightgbm': '3.3.2', 'llvmlite': '0.39.0', 'loguru': '0.6.0', 'LovelyPlots': '0.0.26', 'Mako': '1.2.1', 'Markdown': '3.4.1', 'MarkupSafe': '2.1.1', 'matminer': '0.7.3', 'matplotlib': '3.5.2', 'matplotlib-inline': '0.1.3', 'mccabe': '0.6.1', 'mistune': '0.8.4', 'mof-pricer': '0.1.0', 'mofchecker': '0.9.3', 'mofdscribe': '0.0.1.dev0', 'moffragmentor': '0.0.1.dev0', 'molecule-tda': '0.1.0', 'moleculetda': '0.1.0', 'moltda': '0.1.0', 'monty': '2022.4.26', 'more-itertools': '8.13.0', 'mpmath': '1.2.1', 'multiprocess': '0.70.13', 'munkres': '1.1.4', 'mypy-extensions': '0.4.3', 'nb-conda': '2.2.1', 'nb-conda-kernels': '2.3.1', 'nbclassic': '0.3.5', 'nbclient': '0.6.6', 'nbconvert': '6.5.0', 'nbformat': '5.4.0', 'nest-asyncio': '1.5.5', 'networkx': '2.8.5', 'nglview': '3.0.3', 'nodeenv': '1.7.0', 'notebook': '6.4.12', 'numba': '0.56.0', 'numpy': '1.22.0', 'openpyxl': '3.0.10', 'optuna': '2.10.1', 'packaging': '21.3', 'palettable': '3.3.0', 'pandas': '1.4.3', 'pandocfilters': '1.5.0', 'panel': '0.13.1', 'param': '1.12.2', 'parso': '0.8.3', 'pathspec': '0.9.0', 'pbr': '5.9.0', 'pep8-naming': '0.13.1', 'pervect': '0.0.2', 'pexpect': '4.8.0', 'pickleshare': '0.7.5', 'Pillow': '9.2.0', 'Pint': '0.19.2', 'pip': '22.1.2', 'pkgutil-resolve-name': '1.3.10', 'platformdirs': '2.5.2', 'plotly': '5.9.0', 'pluggy': '1.0.0', 'POT': '0.8.2', 'pre-commit': '2.20.0', 'prettytable': '3.3.0', 'progressbar2': '4.0.0', 'prometheus-client': '0.14.1', 'prompt-toolkit': '3.0.30', 'psutil': '5.9.1', 'ptyprocess': '0.7.0', 'PubChemPy': '1.0.4', 'pure-eval': '0.2.2', 'py': '1.11.0', 'py4j': '0.10.9.5', 'pybind11': '2.10.0', 'pybtex': '0.24.0', 'pycairo': '1.21.0', 'pyclustering': '0.10.1.2', 'pycodestyle': '2.8.0', 'pycparser': '2.21', 'pyct': '0.4.8', 'pydantic': '1.9.1', 'pydata-sphinx-theme': '0.8.1', 'pydocstyle': '6.1.1', 'pyeqeq': '0.0.9', 'pyflakes': '2.4.0', 'pygls': '0.12.1', 'Pygments': '2.12.0', 'pymatgen': '2022.7.25', 'pymongo': '4.2.0', 'pynndescent': '0.5.7', 'pyOpenSSL': '22.0.0', 'pyparsing': '3.0.9', 'pyperclip': '1.8.2', 'pyrsistent': '0.18.1', 'PySocks': '1.7.1', 'pyspellchecker': '0.6.3', 'pystow': '0.4.6', 'pytest': '7.1.2', 'python-dateutil': '2.8.2', 'python-utils': '3.3.3', 'pytz': '2022.1', 'pyviz-comms': '2.2.0', 'PyYAML': '6.0', 'pyzmq': '23.2.0', 'rdkit': '2022.3.4', 'reportlab': '3.5.68', 'requests': '2.28.1', 'requests-file': '1.5.1', 'ruamel.yaml': '0.17.21', 'ruamel.yaml.clib': '0.2.6', 'SciencePlots': '1.0.9', 'scikit-learn': '1.1.1', 'scikit-spatial': '6.4.1', 'scipy': '1.9.0', 'sciris': '1.3.3', 'seaborn': '0.11.2', 'Send2Trash': '1.8.0', 'session-info': '1.0.0', 'setuptools': '61.2.0', 'six': '1.16.0', 'smmap': '5.0.0', 'sniffio': '1.2.0', 'snowballstemmer': '2.2.0', 'soupsieve': '2.3.2.post1', 'sparse': '0.13.0', 'spglib': '1.16.5', 'Sphinx': '5.1.1', 'sphinx-autodoc-typehints': '1.19.1', 'sphinx-automodapi': '0.14.1', 'sphinx-basic-ng': '0.0.1a12', 'sphinx-book-theme': '0.3.3', 'sphinx-click': '4.3.0', 'sphinx-copybutton': '0.5.0', 'sphinx-data-viewer': '0.1.2', 'sphinx-immaterial': '0.8.1', 'sphinx-jsonschema': '1.15', 'sphinx-needs': '1.0.1', 'sphinx-pydantic': '0.1.1', 'sphinxcontrib-applehelp': '1.0.2', 'sphinxcontrib-devhelp': '1.0.2', 'sphinxcontrib-htmlhelp': '2.0.0', 'sphinxcontrib-jsmath': '1.0.1', 'sphinxcontrib-katex': '0.8.6', 'sphinxcontrib-needs': '0.7.9', 'sphinxcontrib-plantuml': '0.24', 'sphinxcontrib-qthelp': '1.0.3', 'sphinxcontrib-serializinghtml': '1.1.5', 'SQLAlchemy': '1.4.39', 'stack-data': '0.3.0', 'stdlib-list': '0.8.0', 'stevedore': '4.0.0', 'structuregraph-helpers': '0.0.8', 'superpose3d': '1.4.1', 'sympy': '1.10.1', 'tabulate': '0.8.10', 'tenacity': '8.0.1', 'terminado': '0.15.0', 'testpath': '0.6.0', 'threadpoolctl': '3.1.0', 'timeout-decorator': '0.5.0', 'tinycss2': '1.1.1', 'toml': '0.10.2', 'tomli': '2.0.1', 'torch': '1.11.0', 'tornado': '6.2', 'tox': '3.25.1', 'tqdm': '4.64.0', 'traitlets': '5.3.0', 'typeguard': '2.13.3', 'typing-extensions': '4.1.1', 'umap-learn': '0.5.3', 'uncertainties': '3.1.7', 'unicodedata2': '14.0.0', 'urllib3': '1.26.11', 'virtualenv': '20.16.2', 'watermark': '2.3.1', 'wcwidth': '0.2.5', 'webencodings': '0.5.1', 'websocket-client': '0.58.0', 'wheel': '0.37.1', 'widgetsnbextension': '3.6.1', 'xgboost': '1.6.1', 'XlsxWriter': '3.0.3', 'zipp': '3.8.1'}, 'system': {'OS Version': 'Darwin 21.4.0', 'Executable': '/Users/kevinmaikjablonka/miniconda3/envs/mofdscribe/bin/python', 'Build Date': 'Mar 25 2022 06:05:16', 'Compiler': 'Clang 12.0.1 ', 'Python API': 1013}}