Particle masking
The selection of subsets of particles belonging to a
SWIFTGalaxy is intended to be as intuitive as possible while
also ensuring consistency between the various particle arrays and accounting for the
practicalities of implementation. Since some masking operations involve (possibly
memory-intensive) copy operations, details to guide the resource-conscious user are
provided below.
Masking a SWIFTGalaxy
There are two ways to manipulate the SWIFTGalaxy object
itself to select a subset of particles. The first is with the
mask_particles() method. This method expects to
receive an instance of swiftgalaxy.masks.MaskCollection.
Note
The velociraptor.swift.swift module makes some use of a namedtuple called
MaskCollection. These objects are not valid where swiftgalaxy functions
expect a MaskCollection because namedtuple objects are immutable.
The MaskCollection is initialised with keyword arguments
specifying masks for each particle type to be masked. Particle types that are not to be
masked can either be omitted, or explicitly given a “null” mask, such as a literal
Ellipsis (...). Any python object that could be given in square brackets as a mask
for a cosmo_array (or ndarray) will be
accepted.
Warning
There is one exception to arguments being interpreted as they would be in the square
brackets of an ndarray: None. In numpy, the name
numpy.newaxis is an alias for None, and extends the dimension of an array by
one at the position where it is inserted. In a
MaskCollection, however, None (and therefore also
numpy.newaxis) is instead interpreted as “no mask”.
Some illustrative examples:
import unyt as u
from swiftgalaxy import SWIFTGalaxy, MaskCollection
sg = SWIFTGalaxy(...)
mask = MaskCollection(
gas=sg.gas.temperatures > 1e6 * u.K, # boolean mask
dark_matter=np.s_[:10], # first 10 particles
stars=..., # Ellipsis, equivalent to keeping previous mask
# black_holes omitted, equivalent to keeping previous mask
)
sg.mask_particles(mask)
Notice the use of the numpy.s_ “index tricks” tool.
Note
Using the mask_particles() method is the only
way to select a subset of particles in-place, i.e. without copying any data. After
calling this function, any particle arrays in memory will be permanently masked, and
any new particle arrays loaded will be trimmed according to the new mask.
The second way to select a subset of particles is using the
__getattr__ method (square brackets). Although
the result is similar to that obtained with the first method, the implementation differs:
this method returns a masked copy of the entire SWIFTGalaxy
object. Using the same mask as above:
masked_sg = sg[mask] # copy operation!
Therefore, if attempting to minimize memory usage, keep in mind:
sg.mask_particles(mask) # memory-efficient
sg = sg[mask] # equivalent result, but memory-inefficient
Masking individual SWIFTParticleDatasets
Passing masks to the SWIFTGalaxy directly (possibly via the
mask_particles() method) is usually the best way to
select subsets of particles. However, it is also possible to apply masks to the particle
datasets (e.g. sg.gas) – this is intended mostly for interactive use-cases. To do
this, simply provide the mask to the dataset’s
__getattr__() method (i.e. in square
brackets). For example:
gasmask = sg.gas.temperatures > 1e6 * u.K
sg.gas[gasmask]
Warning
To ensure internal consistency, this operation creates a copy of the entire
SWIFTGalaxy, and may therefore be memory-intensive if many
particle arrays have been loaded (including other particle types). Furthermore, while
sg.gas = sg.gas[gasmask] will initially work as expected, because the masked
dataset “belongs” to a different SWIFTGalaxy (the copy)
than it is being assigned to, this can lead to subtle and difficult-to-debug issues
down the line and is therefore not recommended. Assigning to another name (e.g.
masked_gas = sg.gas[gasmask]) does not pose any obvious problems, other than being
somewhat memory-inefficient.
Masking and SWIFTNamedColumnDatasets
Named column datasets can be masked just like particle datasets, using the named column
dataset’s __getattr__() method.
For example:
# example for a gas property from the Colibre model:
sg.gas.element_mass_fractions[gasmask]
Warning
As for particle datasets, the operation
sg.gas.element_mass_fractions = sg.gas.element_mass_fractions[gasmask] is not
recommended – see warning above for rationale. In addition, this operation would
result in a named column dataset whose constituent particle arrays have shapes
different from the rest of the particle arrays in the particle dataset hosting the
named column dataset – this is probably undesirable!
Masking particle arrays
As may be intuitively expected, individual particle arrays can be masked on-the-fly as usual:
sg.gas.masses[gasmask]
This returns a masked copy of the individual particle array, so does not imply the same
level of potentially expensive copy operations discussed above. While it is possible to
assign the result back to the particle array (e.g.
sg.gas.masses = sg.gas.masses[gasmask]), this is inadvisable since it will break the
consistency between the shapes of the particle arrays for that particle type. After doing
this, some operations, such as attempting to mask the
SWIFTGalaxy again, are then likely to raise an exception.
Cookbook: all particles in a spherical aperture
One of swiftgalaxy’s features is that it can conveniently provide a set of
particles that a halo finder has identified as belonging to a galaxy (or other object).
However, in some cases this might not be the selection of particles that you want. When
the desired particles include some that are not identified as members by the halo finder,
a modified approach is needed. For example, let’s suppose that you want to select all
simulation particles within a 1 Mpc aperture of a galaxy’s centre, regardless of their
membership status according to the halo catalogue. For illustration we’ll take a galaxy
picked from a SOAP catalogue. The first step is to
override the default extra_mask="bound_only" behaviour with extra_mask=None. We
also need to override the default spatial selection from the simulation, because the 1 Mpc
spherical region of interest might extend beyond the region occupied by member particles
as defined by the halo finder, which is all that the default spatial selection is
guaranteed to enclose:
sg = SWIFTGalaxy(
"my_snapshot.hdf5",
SOAP(
"my_soap.hdf5", # name of catalogue file
soap_index=3, # pick the 4th galaxy (indexed from 0) in the catalogue array
extra_mask=None, # select all particles in the spatial region (for now)
custom_spatial_offsets=cosmo_array(
[[-1, 1], [-1, 1], [-1, 1]], u.Mpc # relative to centre
),
),
)
You can next define the masks selecting particles in your desired spherical aperture,
using SWIFTGalaxy’s convenient spherical coordinates feature,
and store them in a MaskCollection:
mask_collection = MaskCollection(
gas=sg.gas.spherical_coordinates.r < 1 * u.Mpc,
dark_matter=sg.dark_matter.spherical_coordinates.r < 1 * u.Mpc,
stars=sg.stars.spherical_coordinates.r < 1 * u.Mpc,
black_holes=sg.black_holes.spherical_coordinates.r < 1 * u.Mpc,
)
Finally, apply the mask to the sg object:
sg.mask_particles(mask_collection)
You’re now ready to proceed with analysis of the particles in the 1 Mpc spherical aperture
using this sg object.
Note
mask_particles() applies the masks in-place. The
mask could also be applied with the __getattr__()
method (i.e. in square brackets), but this returns a copy of the
SWIFTGalaxy object. If memory efficiency is a concern,
prefer the mask_particles() approach.