Intel® Trace Analyzer and Collector User and Reference Guide

ID 767272
Date 3/31/2023
Public
Document Table of Contents

Collecting Lightweight Statistics

Intel® Trace Collector can gather and store statistics about the function calls and their communication. These statistics are gathered even if no trace data is collected, so it is a good starting point for trying to understand an unknown application that might produce an unmanageable trace.

Usage Instructions

To collect this lightweight statistics for your application, set the following environment variables before tracing:

$ export VT_STATISTICS=ON
$ export VT_PROCESS=OFF

Alternatively, set the VT_CONFIG environment variable to point to the configuration file:

# Enable statistics gathering
STATISTICS ON
# Do not gather trace data
PROCESS 0:N OFF
$ export VT_CONFIG=<configuration_file_path>/config.conf

The statistics is written into the *.stf file. Use the stftool to convert the data to the ASCII text with --print-statistics. For example:

$ stftool tracefile.stf --print-statistics
NOTE:

The resulting output has easy-to-process format, so you can use text processing programs and scripts such as awk*, Perl*, and Microsoft Excel* for better readability. A Perl script convert-stats with this capability is provided in the bin folder.

Output Format

Each line contains the following information:

  • Thread or process

  • Function ID

  • Receiver (if applicable)

  • Message size (if applicable)

  • Number of involved processes (if applicable)

And the following statistics:

  • Count – number of communications or number of calls as applicable

  • Minimum execution time excluding callee times

  • Maximum execution time excluding callee times

  • Total execution time excluding callee times

  • Minimum execution time including callee times

  • Maximum execution time including callee times

  • Total execution time including callee times

Within each line the fields are separated by colons.

Receiver is set to 0xffffffff for file operations and to 0xfffffffe for collective operations. If message size equals 0xffffffff the only defined value is 0xfffffffe to mark it as a collective operation.

The message size is the number of bytes sent or received per single message. With collective operations the following values (buckets of message size) are used for individual instances:

Value Process-local bucket Is the same value on all processes?
MPI_Barrier
0 Yes
MPI_Bcast
Broadcast bytes Yes
MPI_Gather
Bytes sent Yes
MPI_Gatherv
Bytes sent No
MPI_Scatter
Bytes received Yes
MPI_Scatterv
Bytes received No
MPI_Allgather
Bytes sent + received Yes
MPI_Allgatherv
Bytes sent + received No
MPI_Alltoall
Bytes sent + received Yes
MPI_Alltoallv
Bytes sent + received No
MPI_Reduce
Bytes sent Yes
MPI_Allreduce
Bytes sent + received Yes
MPI_Reduce_Scatter
Bytes sent + received Yes
MPI_Scan
Bytes sent + received Yes

Message is set to 0xffffffff if no message was sent, for example, for non-MPI functions or functions like MPI_Comm_rank.

If more than one communication event (message or collective operation) occur in the same function call (for example in MPI_Waitall, MPI_Waitany, MPI_Testsome, MPI_Sendrecv etc.), the time in that function is evenly distributed over all communications and counted once for each message or collective operation. Therefore, it is impossible to compute a correct traditional function profile from the data referring to such function instances (for example, those that are involved in more than one message per actual function call). Only the Total execution time including callee times and the Total execution time excluding callee times can be interpreted similar to the traditional function profile in all cases.

The number of involved processes is negative for received messages. If messages were received from a different process/thread it is -2.

Statistics are gathered on the thread level for all MPI functions, and for all functions instrumented through the API or compiler instrumentation.