Halo catalogues

swiftgalaxy uses a helper class to create a uniform interface to outputs from different halo finders. Provided a python library to read the halo catalogues already exists, this helper class is usually lightweight and easy to create. Currently, the SOAP, Caesar and Velociraptor catalogue formats have built-in support. Other halo finders may be supported on request – pull requests to the repository are also welcome.

The second argument to create a SWIFTGalaxy is an instance of a class derived from the base helper class _HaloCatalogue, such as SOAP. This object has multiple roles. It will be aware of:

  • the location of the halo catalogue files;

  • how to extract the properties of a galaxy of interest from those files;

  • how to create a mask specifying the particles belonging to a galaxy of interest.

SOAP

The SOAP catalogue format is the preferred catalogue format of the SWIFT community and is supported with i/o tools in swiftsimio. The SOAP helper class uses swiftsimio to read the information that it needs from the SOAP catalogues.

Setting up an instance of the helper class is straightforward. We’ll assume a SOAP catalogue called halo_properties_0123.hdf5, and suppose that we’re interested in the first object in the catalogue (row 0):

soap = SOAP(
    soap_file="halo_properties_0123.hdf5",
    soap_index=0,
)

The first argument could also include a path if needed, e.g. /path/to/halo_properties_0123.hdf5.

The properties of the object of interest are made conveniently available with the __getattr__() (dot) syntax, which exposes the interface provided by the SWIFTDataset object handling the SOAP catalogue. For example, the virial mass (M200crit) of our object of interest can be retrieved in the same way as if we were using swiftsimio to read the SOAP catalogue, except that in this case we only get the row corresponding to our chosen object:

soap.spherical_overdensity_200_crit.total_mass.to(u.Msun)

SOAP calculates well over a hundred integrated quantities for each halo in the catalogue. All are available for the object of interest using similar syntax. Refer to the SOAP and swiftsimio documentation for further details of what quantities are available.

Usually the SOAP object is used to create a SWIFTGalaxy object. Assuming that we have a simulation snapshot file snapshot_0123.hdf5 that goes with the catalogue file halo_properties_0123.hdf5 the object is created as:

sg = SWIFTGalaxy(
    "snapshot_0123.hdf5",
    SOAP(
        "halo_properties_0123.hdf5",
        soap_index=0,
    )
)

Note

SOAP records which particles belong to each individual halo in a set of “membership” files, usually found alongside the halo catalogue (e.g. halo_properties_0123.hdf5) in a subdirectory, e.g. membership_0123/membership_0123.X.hdf5 (where X is replaced by integers). swiftgalaxy expects to find the information contained in these files directly in the (single, monolithic) simulation snapshot file. Some simulations (including Colibre) provide a snapshot that includes the membership information already. If such a file is not available, the SOAP code distribution comes with a script make_virtual_snapshot.py that can create the necessary snapshot file containing the particle membership information. The file is “virtual” in the sense that it doesn’t directly store (i.e. copy) the data in the snapshot and membership files but instead contains hyperlinks to the existing data files, providing a single file interface to all of the relevant information. The script help information is available with python make_virtual_snapshot.py --help. In our example we could create the “virtual” snapshot file as:

python make_virtual_snapshot.py \
--absolute-paths \
# input virtual snapshot without membership information:
'snapshot_{snap_nr:04}.hdf5' \
# input membership files:
'membership_{snap_nr:04}/membership_{snap_nr:04}.{file_nr}.hdf5' \
# output virtual snapshot with membership information:
'snapshot_{snap_nr:04}.hdf5' \
# snapshot number:
123

Notice that this script wants a virtual snapshot file as input. This file is copied, so while the script will (probably) work on a non-virtual input snapshot, this will result in data duplication on disk. The {snap_nr:04) is the pattern replaced with the snapshot number provided as the last argument. The {file_nr} is replaced with the number of each file. Attempting to use swiftgalaxy with a snapshot file that does not contain the particle membership information will result in an error similar to AttributeError: 'GasDataset' object has no attribute 'group_nr_bound'.

When working with a SWIFTGalaxy object the interface to the integrated properties is exposed through the halo_catalogue attribute, for example:

sg.halo_catalogue.spherical_overdensity_200_crit.total_mass.to(u.Msun)

By default, the SOAP class will identify the particles that the halo finder deems bound to the object as belonging to the galaxy. This is controlled by the argument:

SOAP(
    ...,
    extra_mask="bound_only"
)

This behaviour can be adjusted. If None is passed instead, then only the spatial masking (provided internally by _get_spatial_mask()) is used. This means that all particles in the set of (probably cubic) subvolumes of the simulation that overlap with the region of interest will be read in. Alternatively, a MaskCollection can be provided for finer control of the particle selection. This will be used to select particles from those already selected spatially.

If a different subset of particles is desired, often the most practical option is to first set up the SWIFTGalaxy with either extra_mask="bound_only" or extra_mask=None and then use the loaded particles to compute a new mask that can then be applied, perhaps permanently. Since all particles within the spatially masked region will always be read in any case, this does not imply any loss of efficiency.

SOAP catalogues lists many centres for halos. swiftgalaxy uses the “input halo centre” (for the HBT+ halo finder this is the centre of potential), and the mass-weighted average velocity of bound particles in the catalogue, as the default coordinate origin (unless the argument auto_recentre=False is passed to SWIFTGalaxy). Any centre and/or reference velocity from a SOAP catalogue can be used, referring to them (in a string) using the same syntax as would be used to access them in swiftsimio, for example:

SOAP(
    ...,
    # centre of mass of particles in R500crit:
    centre_type="spherical_overdensity_500_crit.centre_of_mass",
    # mass-weighted mean velocity of particles in central 1kpc:
    velocity_centre_type="exclusive_sphere_1kpc.centre_of_mass_velocity",
)

The centre and reference velocity can also be shifted (and rotated) to an arbitrary coordinate frame after the SWIFTGalaxy has been created.

To select all particles (not only bound particles) in an aperture around the halo of interest, see the example below.

Caesar

The Caesar catalogue format is popular in the Simba simulation community and lives within the yt ecosystem. The Caesar helper class relies on the Group interface to the halo catalogues. Setting up an instance of the helper class is straightforward. We’ll assume a Caesar output called caesar_catalogue.hdf5. There are two types of groups compatible with SWIFTGalaxy that are defined in these catalogues: halos and galaxies (refer to the Caesar documentation for details). The type of object is specified in a group_type argument (valid values are "halo: or "galaxy"). The position of the object of interest in the corresponding Caesar list is specified in the group_index argument. For example, to choose the 4th entry in the halo list (halo_index=3, since the list is indexed from 0):

cat = Caesar(
    caesar_file="caesar_catalogue.hdf5",
    group_type="halo",
    group_index=3,
)

The first argument could also include a path if needed, e.g. "/path/to/caesar_catalogue.hdf5".

The properties of the object of interest are made conveniently available with the __getattr__() (dot) syntax, which exposes the interface provided by the Group class. For example, the info() function familiar to Caesar users (e.g. using the caesar tools load("caesar_catalogue.hdf5").galaxies[3].info()) is available as:

cat.info()

This lists available integrated properties of the object of interest, for example the virial mass (if available) would be accessed as:

cat.virial_quantities["m200c"]

Caesar is compatible with yt and returns values with units specified with yt that unyt understands by default.

Warning

Caesar defines its own unit registry that specifies how some customised units convert to units provided by yt. For example, a Mpccm (co-moving Mpc) unit is defined. Because swiftsimio provides its own custom implementation of co-moving units that is not explicitly aware of the CAESAR implementation, but both are compatible with yt, some issues can arise. The cosmo_array provided by swiftsimio does produce a warning when potentially ambiguous calculations are attempted (e.g. where its doesn’t know that both argument are co-moving, or that both are physical), and this will trigger on calculations mixing incompatible CAESAR-style and cosmo_array units. However, occasionally swiftsimio uses bare unyt quantities or arrays, and if a CAESAR-style quantity collides with one of these in a calculation silent and incorrect conversion from comoving to physical units (or any other redshift-dependent units) can occur. It is therefore recommended that users convert CAESAR-style quantities to use cosmo_array before they are passed to swiftsimio or swiftgalaxy functions. For example:

import unyt as u
from swiftsimio.objects import cosmo_array, cosmo_factor
scale_factor = ...  # retrieve scale factor from snapshot or catalogue file
cosmo_array(
    cat.virial_quantities["r200c"].to(u.kpc),  # ensures physical units
    comoving=False,
    cosmo_factor=cosmo_factor(a**1, scale_factor=scale_factor)
).to_comoving()  # or leave in physical units if desired

Usually the Caesar object is used to create a SWIFTGalaxy object. In this case the interface is exposed through the halo_catalogue attribute, for example:

sg = SWIFTGalaxy(
    ...,
    Caesar(...),
)
sg.halo_catalogue.info()
sg.halo_catalogue.virial_quantities["m200c"]

By default, the Caesar class will identify the particles that the halo finder deems bound to the object as belonging to the galaxy. This is controlled by the argument:

Caesar(
    ...,
    extra_mask="bound_only"
)

This behaviour can be adjusted. If None is passed instead, then only the spatial masking provided by _get_spatial_mask() is used. This means that all particles in the set of (probably cubic) subvolumes of the simulation that overlap with the region of interest will be read in. Alternatively, a MaskCollection can be provided. This will be used to select particles from those already selected spatially.

Warning

Older CAESAR outputs (prior to updates to the package in October 2023) do not contain enough information to define a spatial sub-region to take advantage of swiftsimio’s spatial masking. swiftgalaxy is still compatible with these older output files but properties of all particles in the box will be read and then masked down to the object of interest, which is very inefficient. When swiftgalaxy doesn’t find the information needed for spatial masking in a CAESAR output file, it will produce a warning at runtime before proceeding (very inefficiently).

If a different subset of particles is desired, often the most practical option is to first set up the SWIFTGalaxy with either extra_mask="bound_only" or extra_mask=None and then use the loaded particles to compute a new mask that can then be applied, perhaps permanently. Since all particles within the spatially masked region will always be read in any case, this does not imply any loss of efficiency.

The Caesar catalogue lists two centres for halos and galaxies. By default, the location of the gravitational potential minimum is assumed as the centre of the objet (and will be used to set the coordinate system, unless the argument auto_recentre=False is passed to SWIFTGalaxy). Usually the available centring options are:

  • "minpot" – potential minimum

  • "" – centre of mass

These can be used as, for example:

Caesar(
    ...,
    centre_type="",  # centre of mass (no suffix in Caesar catalogue)
)

To select all particles (not only bound particles) in an aperture around the halo of interest, see the example below.

Velociraptor

Velociraptor is a widely-used halo finder. Some SWIFT-based simulations projects have used it, but are largely moving to a model where particles are assigned to halos with Velociraptor (or another finder) and a catalogue is produced with the SOAP tool. The Velociraptor catalogue format is therefore falling somewhat out of fashion in the SWIFT community. It is supported for use with SWIFTGalaxy, but is unlikely to be further developed or maintained. The Velociraptor helper class relies on the velociraptor interface to the halo catalogues. Setting up an instance of the helper class is straightforward. If the halo catalogues are named, for example, halos.properties, halos.catalog_groups, etc., and the galaxy of interest occupies the 4th row in the catalogue (halo_index=3, since rows are indexed from 0), then:

cat = Velociraptor(
    "halos",
    halo_index=3
)

The first argument could also include a path if needed, e.g. "/path/to/halos".

Warning

Currently the velociraptor module does not support selecting galaxies by a unique identifier, e.g. cat.ids.id. Users are advised to check this identifier for their selected galaxy to ensure that they obtain the object that they expected.

The properties of the galaxy of interest as calculated by Velociraptor are made conveniently available with the __getattr__() (dot) syntax, which exposes the interface provided by the velociraptor module. For example, the virial mass can be accessed as cat.masses.mvir. Lists of available properties can be printed interactively using print(cat) (or simply cat at the python prompt), or print(cat.masses), etc. When a Velociraptor instance is used to initialize a SWIFTGalaxy, it is made available through the halo_catalogue attribute. For example, to access the virial mass:

sg = SWIFTGalaxy(
    ...,
    Velociraptor(
        ...
    )
)
sg.halo_catalogue.masses.mvir

By default, the Velociraptor class will identify the particles that the halo finder deems bound to the object as belonging to the galaxy. This is controlled by the argument:

Velociraptor(
    ...,
    extra_mask="bound_only"
)

This behaviour can be adjusted. If None is passed instead, then only the spatial masking provided by velociraptor.swift.swift.generate_spatial_mask() is used. This means that all particles in the set of (probably cubic) subvolumes of the simulation that overlap with the region of interest will be read in. Alternatively, a MaskCollection can be provided. This will be used to select particles from those already selected using generate_spatial_mask().

If a different subset of particles is desired, often the most practical option is to first set up the SWIFTGalaxy with either extra_mask="bound_only" or extra_mask=None and then use the loaded particles to compute a new mask that can then be applied, perhaps permanently. Since all particles in the spatial region defined by generate_spatial_mask() will always be read in any case, this does not imply any loss of efficiency.

The Velociraptor halo finder computes several centres for halos. By default, the location of the gravitational potential minimum is assumed as the centre of the galaxy (and will be used to set the coordinate system, unless the argument auto_recentre=False is passed to SWIFTGalaxy). Usually the available centring options are:

  • "minpot" – potential minimum

  • "" – centre of mass (?)

  • "_gas" – gas centre of mass (?)

  • "_star" – stellar centre of mass (?)

  • "mbp" – most bound particle

These can be used as, for example:

Velociraptor(
    ...,
    centre_type="mbp"
)

To select all particles (not only bound particles) in an aperture around the halo of interest, see the example below.

Other halo catalogues

Support for other halo catalogue formats will be considered on request.

Entrepreneurial users may also create their own helper class inheriting from swiftgalaxy.halo_catalogues._HaloCatalogue. In this case, the following methods should be implemented:

  • _load(): called during __init__(), implement any initialisation tasks here.

  • _generate_spatial_mask(): return a SWIFTMask defining the spatial region to be loaded for the galaxy of interest.

  • _generate_bound_only_mask(): return a MaskCollection defining the subset of particles from the loaded spatial region that belong to the galaxy of interest.

  • centre(): return the coordinates (as a cosmo_array) to be used as the centre of the galaxy of interest (implemented with the @property decorator).

  • velocity_centre(): return the coordinates (as a cosmo_array) to be used as the bulk velocity of the galaxy of interest (implemented with the @property decorator).

  • _region_centre(): return the coordinates (as a cosmo_array) to be used as the centre of a bounding box guaranteed to contain all particles belonging to the galaxy of interest (implemented with the @property decorator).

  • _region_aperture(): return the half-lengths (as a cosmo_array) to be used to construct a bounding box guaranteed to contain all particles belonging to the galaxy of interest (implemented with the @property decorator).

In addition, it is recommended to expose the properties computed by the halo finder, masked to the values corresponding to the object of interest. To make this intuitive for users, the syntax to access attributes of the galaxy of interest should preferably match the syntax used for the library conventionally used to read outputs of that halo finder, if it exists. For instance, for Velociraptor this is implemented via __getattr__ (dot syntax), which simply exposes the usual interface (with a mask to pick out the galaxy of interest).

Using swiftgalaxy without a halo catalogue

A helper class called swiftgalaxy.halo_catalogues.Standalone is provided so that the features of swiftgalaxy that aren’t directly tied to a halo catalogue (e.g. spherical and cylindrical coordinates, consistent coordinate frame, etc.) can be used when no supported halo catalogue is available.

Often the most pragmatic way to create a selection of particles using Standalone is to first select a spatial region guaranteed to contain the particles of interest and then create the final mask programatically using SWIFTGalaxy’s masking features. For example, suppose that you know that there is a galaxy with its centre at (2, 2, 2) Mpc and that you eventually want to select all particles in a spherical aperture 1 Mpc in radius around this point. Start with a cubic spatial mask enclosing this region:

from swiftgalaxy import SWIFTGalaxy, Standalone, MaskCollection
from swiftsimio import cosmo_array
import unyt as u

sg = SWIFTGalaxy(
    "my_snapshot.hdf5",
    Standalone(
        centre=cosmo_array([2.0, 2.0, 2.0], u.Mpc),
        velocity_centre=cosmo_array([0.0, 0.0, 0.0], u.km / u.s),
        spatial_offsets=cosmo_array([[-1.0, 1.0], [-1.0, 1.0], [-1.0, 1.0]], u.Mpc),
        extra_mask=None,  # we'll define the exact set of particles later
    )
)

You can next define the masks selecting particles in your desired spherical aperture, using SWIFTGalaxy’s convenient spherical coordinates feature, and store them in a MaskCollection:

mask_collection = MaskCollection(
    gas=sg.gas.spherical_coordinates.r < 1 * u.Mpc,
    dark_matter=sg.dark_matter.spherical_coordinates.r < 1 * u.Mpc,
    stars=sg.stars.spherical_coordinates.r < 1 * u.Mpc,
    black_holes=sg.black_holes.spherical_coordinates.r < 1 * u.Mpc,
)

Finally, apply the mask to the sg object:

sg.mask_particles(mask_collection)

You’re now ready to proceed with analysis of the particles in the 1 Mpc spherical aperture using this sg object.

Note

mask_particles() applies the masks in-place. The mask could also be applied with the __getattr__() method (i.e. in square brackets), but this returns a copy of the SWIFTGalaxy object. If memory efficiency is a concern, prefer the mask_particles() approach.

Selecting particles within an aperture

The workflow to select all particles within a given aperture (e.g. 1 Mpc) also works when starting from a halo catalogue object. For instance, using SOAP you could do the following:

sg = SWIFTGalaxy(
    "my_snapshot.hdf5",
    SOAP(
        "my_soap_file.hdf5",
        soap_index=0,
        # disable selecting only particles flagged as bound by the halo finder:
        extra_mask=None,
        custom_spatial_offsets=cosmo_array(
            [[-1.0, 1.0], [-1.0, 1.0], [-1.0, 1.0]], u.Mpc
        ),
    )
)
mask_collection = MaskCollection(
    gas=sg.gas.spherical_coordinates.r < 1 * u.Mpc,
    dark_matter=sg.dark_matter.spherical_coordinates.r < 1 * u.Mpc,
    stars=sg.stars.spherical_coordinates.r < 1 * u.Mpc,
    black_holes=sg.black_holes.spherical_coordinates.r < 1 * u.Mpc,
)
sg.mask_particles(mask_collection)

The sg object is now ready for further analysis. The same approach works with any halo catalogue interface by setting the extra_mask and custom_spatial_offsets arguments appropriately.