MPI Support Module. More...

#include <stdbool.h>
#include <communication/mpi.h>
#include <communication/wnd.h>
#include <communication/gvt.h>
#include <communication/communication.h>
#include <queues/queues.h>
#include <core/core.h>
#include <arch/atomic.h>
#include <statistics/statistics.h>

Include dependency graph for mpi.c:

Go to the source code of this file.

Macros
#define	MPI_TYPE_STAT_LEN (sizeof(struct stat_t)/sizeof(double))
	The size in bytes of the statistics custom MPI Datatype. It assumes that stat_t contains only double floating point members.

Functions
bool	pending_msgs (int tag)
	Check if there are pending messages. More...

bool	is_request_completed (MPI_Request *req)
	check if an MPI request has been completed More...

void	send_remote_msg (msg_t *msg)
	Send a message to a remote LP. More...

void	receive_remote_msgs (void)
	Receive remote messages. More...

bool	all_kernels_terminated (void)
	Check if all kernels have reached the termination condition. More...

void	collect_termination (void)
	Check if other kernels have reached the termination condition. More...

void	broadcast_termination (void)
	Notify all the kernels about local termination. More...

static void	reduce_stat_vector (struct stat_t in, struct stat_t inout, int len, MPI_Datatype dptr)
	Reduce operation for statistics. More...

static void	stats_reduction_init (void)
	Initialize MPI Datatype and Operation for statistics reduction. More...

void	mpi_reduce_statistics (struct stat_t global, struct stat_t local)
	Invoke statistics reduction. More...

void	dist_termination_init (void)
	Setup the distributed termination subsystem. More...

void	dist_termination_finalize (void)
	Cleanup routine of the distributed termination subsystem. More...

void	syncronize_all (void)
	Syncronize all the kernels. More...

void	mpi_init (int argc, char **argv)
	Initialize MPI subsystem. More...

void	inter_kernel_comm_init (void)
	Initialize inter-kernel communication. More...

void	inter_kernel_comm_finalize (void)
	Finalize inter-kernel communication. More...

void	mpi_finalize (void)
	Finalize MPI. More...

Variables
bool	mpi_support_multithread
	Flag telling whether the MPI runtime supports multithreading.

spinlock_t	mpi_lock

static spinlock_t	msgs_lock
	A guard to ensure isolation in the the message receiving routine.

static unsigned int	terminated = 0

static MPI_Request *	termination_reqs
	MPI Requests to handle termination detection collection asynchronously.

static spinlock_t	msgs_fini
	A guard to ensure isolation in collect_termination()

static MPI_Op	reduce_stats_op
	MPI Operation to reduce statics.

static MPI_Datatype	stats_mpi_t
	MPI Datatype to describe the content of a struct stat_t.

static MPI_Comm	msg_comm
	MPI Communicator for event/control messages. More...

Detailed Description

MPI Support Module.

This module implements all basic MPI facilities to let the distributed execution of a simulation model take place consistently.

Several facilities are thread-safe, others are not. Check carefully which of these can be used by worker threads without coordination when relying on this module.

Copyright: Copyright (C) 2008-2019 HPDCS Group https://hpdcs.github.io

This file is part of ROOT-Sim (ROme OpTimistic Simulator).

ROOT-Sim is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; only version 3 of the License applies.

ROOT-Sim is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with ROOT-Sim; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Author: Tommaso Tocci

Definition in file mpi.c.

Function Documentation

bool all_kernels_terminated ( void )

Check if all kernels have reached the termination condition.

This function checks whether all threads have been informed of the fact that the simulation should be halted, and they have taken proper actions to terminate. Once this function confirms this condition, the process can safely exit.

Warning: This function can be called only after a call to broadcast_termination()

Returns: true if all the kernel have reached the termination condition

Definition at line 270 of file mpi.c.

void broadcast_termination ( void )

Notify all the kernels about local termination.

This function is used to inform all other simulation kernel instances that this kernel is ready to terminate the simulation.

Warning: This function is not thread safe and should be used only by one thread at a time

Note: This function can be used concurrently with other MPI functions (hence its thread unsafety); This function can be called multiple times, but the actual broadcast operation will be executed only on the first call.

Definition at line 327 of file mpi.c.

void collect_termination ( void )

Check if other kernels have reached the termination condition.

This function accumulates termination acknoledgements from remote kernels, and updates the terminated counter.

Note: This function can be called at any point of the simulation, but it will be effective only after that broadcast_termination() has been called locally.; This function is thread-safe

Definition at line 289 of file mpi.c.

Here is the call graph for this function:

Here is the caller graph for this function:

void dist_termination_finalize ( void )

Cleanup routine of the distributed termination subsystem.

Once this function returns, it is sure that we can terminate safely the simulation.

Definition at line 471 of file mpi.c.

Here is the caller graph for this function:

void dist_termination_init ( void )

Setup the distributed termination subsystem.

To correctly terminate a distributed simulation, some care must be taken. In particular:

we must be use that no deadlock is generated, e.g. because some simulation kernel is already waiting for some synchronization action by other kernels
we must be sure that no MPI action is in place/still pending, when MPI_Finalize() is called.

To this end, a specific distributed termination protocol is put in place, which requires some data structures to be available.

This function initializes the subsystem and the datastructures which ensure a clean a nice shutdown of distributed simulations.

Definition at line 452 of file mpi.c.

Here is the caller graph for this function:

void inter_kernel_comm_finalize ( void )

Finalize inter-kernel communication.

This function shutdown the subsystems associated with inter-kernel communication.

Definition at line 562 of file mpi.c.

Here is the call graph for this function:

Here is the caller graph for this function:

void inter_kernel_comm_init ( void )

Initialize inter-kernel communication.

This function initializes inter-kernel communication, by initializing all the other communication subsystems.

Definition at line 545 of file mpi.c.

Here is the call graph for this function:

Here is the caller graph for this function:

bool is_request_completed ( MPI_Request * req )

check if an MPI request has been completed

This function checks whether the operation associated with the specified MPI Request has been completed or not.

Note: This function is thread-safe.

Parameters

req	A pointer to the MPI_Request to check for completion

Returns: true if the operation associated with req is complete, false otherwise.

Definition at line 144 of file mpi.c.

Here is the caller graph for this function:

void mpi_finalize ( void )

Finalize MPI.

This function shutdown the MPI subsystem

Note: Only the master thread on each simulation kernel is expected to call this function

Definition at line 578 of file mpi.c.

Here is the caller graph for this function:

void mpi_init	(	int *	argc,
		char ***	argv
	)

Initialize MPI subsystem.

This is mainly a wrapper of MPI_Init, which contains some boilerplate code to initialize datastructures.

Most notably, here we determine if the library which we are using has suitable multithreading support, and we setup the MPI Communicator which will be used later on to exhange model-specific messages.

Definition at line 514 of file mpi.c.

Here is the caller graph for this function:

void mpi_reduce_statistics	(	struct stat_t *	global,
		struct stat_t *	local
	)

Invoke statistics reduction.

This function is a simple wrapper of an MPI_Reduce operation, which uses the custom reduce operation implemented in reduce_stat_vector() to gather reduced statistics in the master kernel (rank 0).

Parameters

global	A pointer to a struct stat_t where reduced statistics will be stored. The reduction only takes place at rank 0, therefore other simulation kernel instances will never read actual meaningful information in that structure.
local	A pointer to a local struct stat_t which is used as the source of information for the distributed reduction operation.

Definition at line 428 of file mpi.c.

Here is the caller graph for this function:

bool pending_msgs ( int tag )

Check if there are pending messages.

This function tells whether there is a pending message in the underlying MPI library coming from any remote simulation kernel instance. If passing a tag different from MPI_ANY_TAG to this function, a specific tag can be extracted.

Messages are only extracted from MPI_COMM_WORLD communicator. This is therefore only useful in startup/shutdown operations (this is used indeed to initiate GVT and conclude the distributed simulation shutdown).

Note: This function is thread-safe.

Parameters

tag	The tag of the messages to check for availability.

Returns: true if a pending message tagged with tag is found, false otherwise.

Definition at line 122 of file mpi.c.

Here is the caller graph for this function:

void receive_remote_msgs ( void )

Receive remote messages.

This function extracts from MPI events destined to locally-hosted LPs. Only messages to LP can be extracted here, because the probing is done towards the msg_comm communicator.

A message which is extracted here is placed (out of order) in the bottom half of the destination LP, for later insertion (in order) in the input queue.

This function will try to extract as many messages as possible from the underlying MPI library. In particular, once this function is called, it will return only after that no message can be found in the MPI library, destined to this simulation kernel instance.

Currently, this function is called once per main loop iteration. Doing more calls might significantly imbalance the workload of some worker thread.

Note: This function is thread-safe.

Definition at line 208 of file mpi.c.

Here is the call graph for this function:

Here is the caller graph for this function:

static void reduce_stat_vector	(	struct stat_t *	in,
		struct stat_t *	inout,
		int *	len,
		MPI_Datatype *	dptr
	)

static

Reduce operation for statistics.

This function implements a custom MPI Operation used to reduce globally local statistics upon simulation shutdown. This function is bound to reduce_stats_op in stats_reduction_init().

Definition at line 348 of file mpi.c.

Here is the caller graph for this function:

void send_remote_msg ( msg_t * msg )

Send a message to a remote LP.

This function takes in charge an event to be delivered to a remote LP. The sending operation is non-blocking: to this end, the message is registered into the outgoing queue of the destination kernel, in order to allow MPI to keep track of the sending operation.

Also, the message being sent is registered at the sender thread, to keep track of the white/red message information which is necessary to correctly reduce the GVT value.

Note: This function is thread-safe.

Parameters

msg	A pointer to the msg_t keeping the message to be sent remotely

Definition at line 169 of file mpi.c.

Here is the call graph for this function:

Here is the caller graph for this function:

static void stats_reduction_init ( void )

static

Initialize MPI Datatype and Operation for statistics reduction.

To reduce statistics, we rely on a custom MPI Operation. This operation requires a pre-built MPI Datatype to properly handle the structures which we use to represent the local information.

This function is called when initializing inter-kernel communication, and its purpose is exactly that of setting up a custom MPI datatype in stats_mpi_t.

Additionally, this function defines the custom operation implemented in reduce_stat_vector() which is bound to the MPI Operation reduce_stats_op.

Definition at line 381 of file mpi.c.

Here is the call graph for this function:

Here is the caller graph for this function:

void syncronize_all ( void )

Syncronize all the kernels.

This function can be used as syncronization barrier between all the threads of all the kernels.

The function will return only after all the threads on all the kernels have already entered this function.

We create a new communicator here, to be sure that we synchronize exactly in this function and not somewhere else.

Warning: This function is extremely resource intensive, wastes a lot of cpu cycles, and drops performance significantly. Avoid using it as much as possible!

Definition at line 492 of file mpi.c.

Here is the call graph for this function:

Here is the caller graph for this function:

Variable Documentation

spinlock_t mpi_lock

This global lock is used by the lock/unlock_mpi macro to control access to MPI interface. If proper MPI threading support is available from the runtime, then it is not used.

Definition at line 55 of file mpi.c.

MPI_Comm msg_comm

static

MPI Communicator for event/control messages.

To enable zero-copy message passing, we must know what LP is the destination of an event, before extracting that event from the MPI layer. This is necessary to determine from what slab/buddy the memory to keep the event must be taken. Yet, this is impossible because the MPI layer does not allow to do so.

To actually be able to do that, the trick is to create a separate MPI Communicator which is used only to exchance events across LPs (control messages also fall in this category). Then, since we can extract events from this communicator, we can match against both MPI_ANY_SOURCE (to receive events from any simulation kernel instance) and MPI_ANY_TAG (to match independently of the tag).

We therefore use the tag to identify the GID of the LP.

We can retrieve the information about the message sender and the size of the message which will be extracted by inspecting the MPI_Status variable after an MPI_Iprobe is completed.

Definition at line 100 of file mpi.c.

unsigned int terminated = 0

static

This counter tells how many simulation kernel instances have already reached the termination condition. This is updated via collect_termination().

Definition at line 64 of file mpi.c.

Macros

Functions

Variables

Detailed Description

Function Documentation

Variable Documentation