Skip to content

Supported datasets

The following table shows a selection of supported datasets. The table is not exhaustive, but should give an idea of the range of supported datasets. If you want to use a dataset that is not listed here, read on here and consider opening an issue or contact us directly.

Name Support Description
AURIGA Cosmological zoom-in galaxy formation simulations
EAGLE Cosmological galaxy formation simulations
FIRE2 Cosmological zoom-in galaxy formation simulations
FLAMINGO Cosmological galaxy formation simulations
Gaia [download] Observations of a billion nearby stars
Illustris Cosmological galaxy formation simulations
LGalaxies [1] Semi-analytical model for Millenium simulations
SDSS DR16 Observations for millions of galaxies
SIMBA Cosmological galaxy formation simulations
TNG[2] Cosmological galaxy formation simulations
TNG-Cluster Cosmological zoom-in galaxy formation simulations

A checkmark indicates support out-of-the-box, a checkmark indicates work-in-progress support or the need to create a suitable configuration file. A checkmark indicates support for converted HDF5 versions of the original data.

Dataset Details


Access via individual datasets are supported, e.g.:

>>> from scida import load
>>> load("LGal_Ayromlou2021_snap58.hdf5")

while access to the series at once (i.e. loading all data for all snapshots in a folder) is not supported.

The TNG Simulation Suite


The IllustrisTNG project is a series of large-scale cosmological magnetohydrodynamical simulations of galaxy formation. The data is available at

Demo data

Many of the examples in this documentation use the TNG50-4 simulation. In particular, we make a snapshot and group catalog available to run these examples. You can download and extract the snapshot and its group catalog from the TNG50-4 test data using the following commands:

wget "" -O snapshot.tar.gz
tar -xvf snapshot.tar.gz
wget "" -O catalog.tar.gz
tar -xvf catalog.tar.gz

These files are exactly the same files that can be downloaded from the official IllustrisTNG data release.

The snapshot and group catalog should be placed in the same folder. Then you can load the snapshot with ds = load("./snapdir_030"). If you are executing the code from a different folder, you need to adjust the path accordingly. The group catalog should automatically be detected when available in the same parent folder as the snapshot, otherwise you can also pass the path to the catalog via the catalog keyword to load().


The TNGLab is a web-based analysis platform running a JupyterLab instance with access to dedicated computational resources and all TNG data sets to provide a convenient way to run analysis code on the TNG data sets. As TNGLab supports scida, it is a great way to get started and for running the examples.

In order to run the examples which use the demo data, replace

ds = load("./snapdir_030")


ds = load("/home/tnguser/sims.TNG/TNG50-4/output/snapdir_030")

for these examples.

Alternatively, you can use

sim = load("TNG50-4")
ds = sim.get_dataset(30)

where "TNG50-4" is a pre-defined shortcut to the TNG50-4 simulation path on TNGLab. After having loaded the simulation, we request the snapshot "30" as used in the demo data. Custom shortcuts can be defined in the simulation configuration.

Supported file formats and their structure

Here, we discuss the requirements for easy extension/support of new datasets. Currently, input files need to have one of the following formats:

  • hdf5
  • multi-file hdf5: We assume a directory containing hdf5 files of the pattern prefix.XXX.hdf5, where prefix will be determined automatically and XXX is a contiguous list of integers indicating the order of hdf5 files to be merged. Hdf5 files are expected to have the same structure and all fields, i.e. hdf5 datasets, will be concatenated along their first axis.
  • zarr

Support for FITS is work-in-progress, also see here for a proof-of-concept.

Scida and above file formats use a hierarchical structure to store data with three fundamental objects:

  • Groups are containers for other groups or datasets.
  • Datasets are multidimensional arrays of a homogeneous type, usually bundled into some Group.
  • Attributes provide various metadata.

At this point, we only support unstructured datasets, i.e. datasets that do not depend on the memory layout for their interpretation. For example, this implies that simulation codes utilizing uniform or adaptive grids are not supported.

We explicitly support simulations run with the following codes: