Configuration
Main configuration file
The main configuration file is located at ~/.scida/config.yaml
. If this file does not exist, it is created with the
first use of scida. The file is using the YAML format.
The following options are available:
copied_default
-
If this option is set True, a warning is printed because the copied default config has not been adjusted by the user yet. Once you have done so, remove this line.
cache_path
-
Sets the folder to use as a cache for scida. Recommended to be moved out of the home directory to a fast disk.
datafolders
-
A list of folders to scan for data specifiers when using
scida.load("specifier")
. nthreads
-
scida itself might use multiple threads for some operations. This option sets the number of threads to use. This is independent of any dask threading. Default: 8
missing_units
-
How to handle missing units. Can be "warn", "raise", or "ignore". "warn" will print a warning, "raise" will raise an exception, and "ignore" will silently continue without the right units. Default: "warn"
testdata_path
- The base path to the test data sets defined in "tests/testdata.yaml".
Simulation configuration
By default, scida will load supported simulation configurations from the package.
User configurations for simulations are loaded from ~/.config/scida/simulations.yaml
. This file is also in YAML format.
The configuration has to have the following structure:
data:
SIMNAME1:
SIMNAME2:
Each simulation could look something like this:
data:
SIMNAME1:
aliases:
- SIMNAME
- SMN1
identifiers:
Parameters:
SimName: SIMNAME1
Config:
SavePath:
content: /path/to/simname
match: substr
unitfile: units/simnameunits.yaml
dataset_type:
series: ArepoSimulation
dataset: ArepoSnapshot
aliases
-
A list of aliases for the simulation. These can be used to load the simulation with
scida.load("alias")
. identifiers
-
A dictionary of identifiers from the metadata of a given dataset to identify it as such. In above example "/Parameters" is the path to an attribute "SimName" in the HDF5/zarr metadata with the exact content as given. Multiple identifiers can be given, in which case all have to match. Partial matches of a given key-value key are possible by passing a dictionary {"content": "valuesubstr", match: substring} rather than a string.
unitfile
-
The path to the unitfile relative to the user/repository simulation configuration. user configurations take precedence over the package configuration.
dataset_type
-
Can explicitly fix the dataset/series type for a simulation.
Unit files
Unit files are used to determine the units of datasets, particularly for datasets that do not have metadata
that can be used to infer units. Unit files are specified either explicitly via the unitfile
option in scida.load
or implicitly via the simulation configuration, see above. Relative paths, such as units/simnameunits.yaml
are
relative to the user/package simulation config folder. The former (~/.config/scida/
) takes precedence.
A unit file could look like this:
metadata_unitsystem: cgs
units:
unit_length: 100.0 * km
unit_mass: g
fields:
_all:
CounterID: none
Coordinates: unit_length
InternalArrays: none
PartType0:
SubPartType0:
FurthestSubgroupDistance: unit_length
NearestNeighborDistance: unit_length
Energy: 10.0 * erg
metadata_unitsystem
-
The unitsystem assumed when deducing units from metadata dimensions where available. Only cgs supported right now.
units
-
unit definitions that are used in the following
fields
section. The units are defined as pint expressions. fields
-
A dictionary of fields and their units. The fields are specified as a path to the field in the dataset. The special field
_all
can be used to set the default unit for all fields with a given name irrespective of the path of the field. Other than that, entries represent the fields or containers of fields. The special fieldnone
can be used to set the unit to None, i.e. no unit. This is differently handled than " "/"dimensionless" as the field will be treated as array rather than dimensionless pint array.