Particle masking

The selection of subsets of particles belonging to a SWIFTGalaxy is intended to be as intuitive as possible while also ensuring consistency between the various particle arrays and accounting for the practicalities of implementation. Since some masking operations involve (possibly memory-intensive) copy operations, details to guide the resource-conscious user are provided below.

Masking a SWIFTGalaxy

There are two ways to manipulate the SWIFTGalaxy object itself to select a subset of particles. The first is with the mask_particles() method. This method expects to receive an instance of swiftgalaxy.masks.MaskCollection.

Note

The velociraptor.swift.swift module makes some use of a namedtuple called MaskCollection. These objects are not valid where swiftgalaxy functions expect a MaskCollection because namedtuple objects are immutable.

The MaskCollection is initialised with keyword arguments specifying masks for each particle type to be masked. Particle types that are not to be masked can either be omitted, or explicitly given a “null” mask, such as a literal Ellipsis (...). Any python object that could be given in square brackets as a mask for a cosmo_array (or ndarray) will be accepted.

Warning

There is one exception to arguments being interpreted as they would be in the square brackets of an ndarray: None. In numpy, the name numpy.newaxis is an alias for None, and extends the dimension of an array by one at the position where it is inserted. In a MaskCollection, however, None (and therefore also numpy.newaxis) is instead interpreted as “no mask”.

Some illustrative examples:

import unyt as u
from swiftgalaxy import SWIFTGalaxy, MaskCollection
sg = SWIFTGalaxy(...)
mask = MaskCollection(
    gas=sg.gas.temperatures > 1e6 * u.K,  # boolean mask
    dark_matter=np.s_[:10],  # first 10 particles
    stars=...,  # Ellipsis, equivalent to keeping previous mask
    # black_holes omitted, equivalent to keeping previous mask
)
sg.mask_particles(mask)

Notice the use of the numpy.s_ “index tricks” tool.

Note

Using the mask_particles() method is the only way to select a subset of particles in-place, i.e. without copying any data. After calling this function, any particle arrays in memory will be permanently masked, and any new particle arrays loaded will be trimmed according to the new mask.

The second way to select a subset of particles is using the __getattr__ method (square brackets). Although the result is similar to that obtained with the first method, the implementation differs: this method returns a masked copy of the entire SWIFTGalaxy object. Using the same mask as above:

masked_sg = sg[mask]  # copy operation!

Therefore, if attempting to minimize memory usage, keep in mind:

sg.mask_particles(mask)  # memory-efficient
sg = sg[mask]  # equivalent result, but memory-inefficient

Masking individual SWIFTParticleDatasets

Passing masks to the SWIFTGalaxy directly (possibly via the mask_particles() method) is usually the best way to select subsets of particles. However, it is also possible to apply masks to the particle datasets (e.g. sg.gas) – this is intended mostly for interactive use-cases. To do this, simply provide the mask to the dataset’s __getattr__() method (i.e. in square brackets). For example:

gasmask = sg.gas.temperatures > 1e6 * u.K
sg.gas[gasmask]

Warning

To ensure internal consistency, this operation creates a copy of the entire SWIFTGalaxy, and may therefore be memory-intensive if many particle arrays have been loaded (including other particle types). Furthermore, while sg.gas = sg.gas[gasmask] will initially work as expected, because the masked dataset “belongs” to a different SWIFTGalaxy (the copy) than it is being assigned to, this can lead to subtle and difficult-to-debug issues down the line and is therefore not recommended. Assigning to another name (e.g. masked_gas = sg.gas[gasmask]) does not pose any obvious problems, other than being somewhat memory-inefficient.

Masking and SWIFTNamedColumnDatasets

Named column datasets can be masked just like particle datasets, using the named column dataset’s __getattr__() method. For example:

sg.gas.element_mass_fractions[gasmask]  # example for a gas property from the Colibre model

Warning

As for particle datasets, the operation sg.gas.element_mass_fractions = sg.gas.element_mass_fractions[gasmask] is not recommended – see warning above for rationale. In addition, this operation would result in a named column dataset whose constituent particle arrays have shapes different from the rest of the particle arrays in the particle dataset hosting the named column dataset – this is probably undesirable!

Masking particle arrays

As may be intuitively expected, individual particle arrays can be masked on-the-fly as usual:

sg.gas.masses[gasmask]

This returns a masked copy of the individual particle array, so does not imply the same level of potentially expensive copy operations discussed above. While it is possible to assign the result back to the particle array (e.g. sg.gas.masses = sg.gas.masses[gasmask]), this is inadvisable since it will break the consistency between the shapes of the particle arrays for that particle type. After doing this, some operations, such as attempting to mask the SWIFTGalaxy again, are then likely to raise an exception.

Cookbook: all particles in a spherical aperture

One of swiftgalaxy’s features is that it can conveniently provide a set of particles that a halo finder has identified as belonging to a galaxy (or other object). However, in some cases this might not be the selection of particles that you want. When the desired particles include some that are not identified as members by the halo finder, a modified approach is needed. For example, let’s suppose that you want to select all simulation particles within a 1 Mpc aperture of a galaxy’s centre, regardless of their membership status according to the halo finder. For illustration we’ll take a galaxy picked from a Velociraptor catalogue. The first step is to override the default extra_mask="bound_only" behaviour with extra_mask=None. We also need to override the default spatial selection from the simulation, because the 1 Mpc spherical region of interest might extend beyond the region occupied by member particles as defined by the halo finder, which is all that the default spatial selection is guaranteed to enclose:

sg = SWIFTGalaxy(
    "my_snapshot.hdf5",
    Velociraptor(
        "halos",  # name of output files excluding extension (e.g. 'halos.properties', etc.)
        halo_index=3,  # pick the 4th galaxy (i.e. indexed from 0) in the catalogue array
        extra_mask=None,  # select all particles in the spatially selected region (for now)
        custom_spatial_offsets=cosmo_array([[-1, 1], [-1, 1], [-1, 1]], u.Mpc), # relative to centre
    ),
)

You can next define the masks selecting particles in your desired spherical aperture, using SWIFTGalaxy’s convenient spherical coordinates feature, and store them in a MaskCollection:

mask_collection = MaskCollection(
    gas=sg.gas.spherical_coordinates.r < 1 * u.Mpc,
    dark_matter=sg.dark_matter.spherical_coordinates.r < 1 * u.Mpc,
    stars=sg.stars.spherical_coordinates.r < 1 * u.Mpc,
    black_holes=sg.black_holes.spherical_coordinates.r < 1 * u.Mpc,
)

Finally, apply the mask to the sg object:

sg.mask_particles(mask_collection)

You’re now ready to proceed with analysis of the particles in the 1 Mpc spherical aperture using this sg object.

Note

mask_particles() applies the masks in-place. The mask could also be applied with the __getattr__() method (i.e. in square brackets), but this returns a copy of the SWIFTGalaxy object. If memory efficiency is a concern, prefer the mask_particles() approach.