swiftgalaxy.iterator module

Iterate over SWIFTGalaxy objects efficiently.

Provides the SWIFTGalaxies class that enables efficient iteration over SWIFTGalaxy objects for multiple objects of interest within a single simulation snapshot.

Parallelization is not yet implemented but is prioritized for future release.

class swiftgalaxy.iterator.SWIFTGalaxies(snapshot_filename: str, halo_catalogue: _HaloCatalogue, auto_recentre: bool = True, preload: Set[str] = {}, transforms_like_coordinates: Set[str] = {}, transforms_like_velocities: Set[str] = {}, id_particle_dataset_name: str = 'particle_ids', coordinates_dataset_name: str = 'coordinates', velocities_dataset_name: str = 'velocities', coordinate_frame_from: SWIFTGalaxy | None = None, optimize_iteration: str = 'auto')[source]

Bases: object

Facilitates efficiently iterating over many objects of interest from a simulation.

SWIFT simulation snapshots contain particles grouped by “top-level cells” that cover the simulation volume. The minimum number of particles that it makes sense to read is therefore those contained in one such top-level cell. If one wants to create many SWIFTGalaxy objects from one simulation snapshot, there is a risk that the same data are read many times, such as when multiple target objects lie within the same top-level cell. This class provides a convenient way to iterate over multiple target objects while minimizing the I/O overhead by managing the order of iteration to group together target objects that occupy common top-level cells and only reading the data once.

An important consequence to be aware of is that the iteration order is not controlled by the user because it must be chosen to group objects in the same top-level cell(s) together. The iteration order is available as the iteration_order attribute of a SWIFTGalaxies object. Alternatively, output of a function applied to a list of target objects in the same order as the input list can be obtained using the map() method.

There is an obvious opportunity to parallelize the iteration process by passing each region (potentially each containing multiple target objects) to worker processes as they become available, for example. This current initial version of the SWIFTGalaxies class does not yet support parallel iteration, instead prioritizing the release of a working serial implementation. Support for parallelization will be added later as a high priority.

Parameters:

snapshot_filename (str) – Name of file containing snapshot.
halo_catalogue (_HaloCatalogue) – A halo catalogue instance from swiftgalaxy.halo_catalogues, e.g. a swiftgalaxy.halo_catalogues.SOAP instance. It should specify more than one target object, e.g. by setting its soap_index=[0, 123, 456, ...].
auto_recentre (bool (optional), default: True) – If True, the coordinate system will be automatically recentred on the position and velocity centres defined by the halo_catalogue.
preload (set (optional), default: set()) – Deprecated and ignored.
transforms_like_coordinates (set (optional), default: set()) – Names of fields that behave as velocities. It is assumed that these exist for all present particle types. When the coordinate system is rotated or boosted, the associated arrays will be transformed accordingly. The velocities dataset (or its alternative name given in the velocities_dataset_name parameter) is implicitly assumed to behave as velocities.
transforms_like_velocities (set (optional), default: set()) – Names of fields that behave as velocities. It is assumed that these exist for all present particle types. When the coordinate system is rotated or boosted, the associated arrays will be transformed accordingly. The velocities dataset (or its alternative name given in the velocities_dataset_name parameter) is implicitly assumed to behave as velocities.
id_particle_dataset_name (str (optional), default: "particle_ids") – Name of the dataset containing the particle IDs, assumed to be the same for all present particle types.
coordinates_dataset_name (str (optional), default: "coordinates") – Name of the dataset containing the particle spatial coordinates, assumed to be the same for all present particle types.
velocities_dataset_name (str (optional), default: "velocities") – Name of the dataset containing the particle velocities, assumed to be the same for all present particle types.
coordinate_frame_from (SWIFTGalaxy (optional), default: None) – Another SWIFTGalaxy to copy the coordinate frame (centre and rotation) and velocity coordinate frame (boost and rotation) from.
optimize_iteration (str (optional), default: "auto") – Can be "auto", "dense" or "sparse". See docstrings of methods _eval_sparse_optimized_solution() and _eval_dense_optimized_solution() for explanations of optimization schemes. In most cases leave set to default "auto" to automatically determine optimal solution.

Examples

Using SWIFTGalaxies is almost the same as using the main SWIFTGalaxy class, except that (i) the halo catalogue is initialized with multiple target objects and (ii) the SWIFTGalaxies class provides an iteration method (__iter__), and determines its own iteration order. For example:

from swiftgalaxy import SWIFTGalaxies, SOAP
sgs = SWIFTGalaxies(
    "snapshot.hdf5",
    SOAP(
        "soap.hdf5",
        soap_index=[0, 123, 456],  # multiple target indices
    ),
)
iteration_order = sgs.iteration_order  # be aware of the order of iteration
for sg in sgs:
    # some analysis involving the pre-loaded data fields goes here:
    sg.element_abundances.carbon
    sg.dark_matter.coordinates
    sg.stars.velocities

Alternatively the map() method can be used to apply a function to all of the SWIFTGalaxy’s created by this class. For example:

from swiftgalaxy import SWIFTGalaxies, SOAP
sgs = SWIFTGalaxies(
    "snapshot.hdf5",
    SOAP(
        "soap.hdf5",
        soap_index=[0, 123, 456],  # multiple target indices
    ),
)

def analysis(sg):
    # this function can also have additional args & kwargs, if needed
    # it should only access the pre-loaded data fields
    sg.element_abundances.carbon
    sg.dark_matter.coordinates
    sg.stars.velocities
    return sg.element_abundances.carbon.mean()

# map accepts arguments `args` and `kwargs`, passed through to function, if needed
result = sgs.map(analysis)

property iteration_order: ndarray

Property holding the order that the target objects will be iterated in.

The iteration order is likely not the same as the order that the targets are provided in because this is probably not an optimal iteration order. This property attribute provides the optimized iteration order evaluated by SWIFTGalaxies.

Returns:: Array of indices specifying the iteration order.
Return type:: numpy.ndarray

map(func: Callable, args: List[Tuple] | None = None, kwargs: List[Dict] | None = None) → List[Any][source]

Apply a function to each object of interest and return a list of results.

The iteration order of SWIFTGalaxies is not necessarily the order that the objects of interest are provided by the user because the class determined an efficient iteration order to minimize I/O operations. This method applies a provided function to each object of interest in an efficient order then returns the results in a list ordered in the same order that the objects of interest were input.

The function to be evaluated should expect a SWIFTGalaxy (from those to be iterated over) as its first argument. It may accept lists of additional arguments and/or keyword arguments (with each element corresponding to one entry in the list of target objects) that can be passed to map as a tuple of arguments and a dict of keyword arguments.

Currently this function only executes serially but adding a parallel execution option, and further support for parallelization in analysis, is a high priority.

Parameters:

func (callable) – The function to be evaluated.
args (list (optional), default: None) – List of additional arguments to the function to be evaluated (the first argument is always the current SWIFTGalaxy in the iteration). Each item in the list should be a tuple of arguments, with one tuple for each galaxy being iterated over. See examples section for further details.
kwargs (list (optional), default: None) – List of additional keyword arguments to pass to the function to be evaluated. Each item in the list should be a dict of keyword arguments, with one dict for each galaxy being iterated over. Dictionary keys are the names of the keyword arguments and the corresponding dictionary values are the values of the keyword arguments. See examples section for further details.

Returns:

A list containing the return value(s) of the function applied to each object of interest, in the same order as the objects of interest were passed to the halo finder interface.

Return type:

list

Examples

A simple example that applies a function dm_median_position to each galaxy in a list of targets [11, 22, 33]:

from swiftgalaxy import SWIFTGalaxies, SOAP

# define the function that we will apply to each SWIFTGalaxy object:
def dm_median_position(sg):
    return np.median(sg.dark_matter.coordinates, axis=0)

sgs = SWIFTGalaxies(
    "my_snapshot.hdf5",
    SOAP(
        "my_soap.hdf5",
        soap_index=[11, 22, 33],
    ),
)
my_result = sgs.map(dm_median_position)

The result stored in my_result contains the result of the function for the galaxies at index 11, 22 and 33, in the same order as they are given in the soap_index list.

This second example shows how to pass extra arguments and/or keyword arguments to the function given to map:

from swiftgalaxy import SWIFTGalaxies, SOAP

# define the function that we will apply to each SWIFTGalaxy object:
def dm_median_position(
    sg,  # the first argument is always a SWIFTGalaxy from the iteration
    extra_argument_1,
    extra_argument_2,
    extra_kwarg_1=None,
    extra_kwarg_2=None,
):
    # presumably make use of the extra arguments and/or kwargs here...
    return np.median(sg.dark_matter.coordinates, axis=0)

sgs = SWIFTGalaxies(
    "my_snapshot.hdf5",
    SOAP("my_soap.hdf5",
    soap_index=[11, 22, 33]),
)
my_result = sg.map(
    dm_median_position,
    args=[
        (my_extra_arg_1_for_galaxy_11, my_extra_arg_2_for_galaxy_11),
        (my_extra_arg_1_for_galaxy_22, my_extra_arg_2_for_galaxy_22),
        (my_extra_arg_1_for_galaxy_33, my_extra_arg_2_for_galaxy_33),
    ],
    kwargs=[
        dict(
            extra_kwarg_1=my_extra_kwarg_1_for_galaxy_11,
            extra_kwarg_2=my_extra_kwarg_2_for_galaxy_11,
        ),
        dict(
            extra_kwarg_1=my_extra_kwarg_1_for_galaxy_22,
            extra_kwarg_2=my_extra_kwarg_2_for_galaxy_22,
        ),
        dict(
            extra_kwarg_1=my_extra_kwarg_1_for_galaxy_33,
            extra_kwarg_2=my_extra_kwarg_2_for_galaxy_33,
        ),
    ]
)

Note that if you have only a single extra argument it must still be packaged as a tuple, for instance:

args=[
    (my_extra_arg_for_galaxy_11, ),
    (my_extra_arg_for_galaxy_22, ),
    (my_extra_arg_for_galaxy_33, ),
]

The commas inside the parentheses are not optional!