Derived fields

Info

If you want to run the code below, consider downloading the demo data or use the TNGLab online.

Commonly during analysis, newly derived quantities/fields are to be synthesized from one or more snapshot fields into a new field. For example, while the temperature, pressure, or entropy of gas is not stored directly in the snapshots, they can be computed from fields which are present on disk.

There are two ways to create new derived fields. For quick analysis, we can simply leverage dask arrays themselves.

Defining new quantities with dask arrays

from scida import load
ds = load("./snapdir_030") # (1)!
gas = ds.data['gas']
kineticenergy = 0.5*gas['Masses']*(gas['Velocities']**2).sum(axis=1)

In this example, we assume a dataset, such as the demo data set, that has its fields (Masses, Velocities) nested by particle type (gas)

In the example above, we define a new dask array called "kineticenergy". Note that just like all other dask arrays and dataset fields, these fields are "virtual", i.e. only the graph of their construction is held in memory, which can be instantiated by applying the .compute() method.

We can also add this field from above example to the existing ones in the dataset.

gas['kineticenergy'] = kineticenergy

Defining new quantities with field recipes

Working with complex datasets over a longer period, it is often useful to have a large range of fields available. The above approach with dask arrays suffers from some shortcomings. For example, in some cases the memory footprint and instantiation time for each field can add up to substantial loading times. Also, when defining fields with dask arrays, these fields need to be defined in order of their respective dependencies.

For this purpose, field recipes are available. An example of such recipe is given below.

import numpy as np

from scida import load
ds = load("./snapdir_030")

@ds.register_field("stars")  # (1)!
def VelMag(arrs, **kwargs):
    import dask.array as da
    vel = arrs['Velocities']
    v, u = vel.magnitude, vel.units
    return da.sqrt(v[:,0]**2 + v[:,1]**2 + v[:,2]**2) * u

Here, stars is the name of the field container the field should be added to. The field will now be available as ds['stars']['VelMag']

The field recipe is translated into a regular field, i.e. dask array, the first time it is queried for. Above example can be queried as:

ds['stars']['VelMag']

Practically working with these fields, there is no difference between derived and on-disk fields.

Adding multiple fields

It can be useful to write (a) dedicated field definition file(s). First, initialize a FieldContainer

from scida.fields import FieldContainer
groupnames = ["PartType0", "Subhalo"]  # (1)!
fielddefs = FieldContainer(containers=groupnames)

@fielddefs.register_field("PartType0") # (2)!
def Volume(arrs, **kwargs):
    return arrs["Masses"]/arrs["Density"]

@fielddefs.register_field("all") # (3)!
def GroupDistance3D(arrs, snap=None):
    """Returns distance to hosting group center. Returns rubbish if not actually associated with a group."""
    import dask.array as da
    boxsize = snap.header["BoxSize"]
    pos_part = arrs["Coordinates"]
    groupid = arrs["GroupID"]
    if hasattr(groupid, "magnitude"):
        groupid = groupid.magnitude
        boxsize *= snap.ureg("code_length")
    pos_cat = snap.data["Group"]["GroupPos"][groupid]
    dist3 = (pos_part-pos_cat)
    dist3, u = dist3.magnitude, dist3.units
    dist3 = da.where(dist3>boxsize/2.0, boxsize-dist3, dist3)
    dist3 = da.where(dist3<=-boxsize/2.0, boxsize+dist3, dist3) # PBC
    return dist3 * u

@fielddefs.register_field("all")
def GroupDistance(arrs, snap=None):
    import dask.array as da
    dist3 = arrs["GroupDistance3D"]
    dist3, u = dist3.magnitude, dist3.units
    dist = da.sqrt((dist3**2).sum(axis=1))
    dist = da.where(arrs["GroupID"]==-1, np.nan, dist) # set unbound gas to nan
    return dist * u

We define a list of field containers that we want to add particles to.
Specify the container we want to have the field added to.
Using the "all" identifier, we can also attempt to add this field to all containers we have specified.

Finally, we just need to import the fielddefs object (if we have defined it in another file) and merge them with a dataset that we loaded:

ds = load("./snapdir_030")
ds.data.merge(fielddefs)

In above example, we now have the following fields available:

gas = ds.data["PartType0"]
print(gas["Volume"])
print(gas["GroupDistance"])