Intel® Trace Analyzer and Collector User and Reference Guide

ID 767272
Date 3/31/2023
Public
Document Table of Contents

Intel® Trace Analyzer Command Line Interface Reference

The command-line interface (CLI) to the Intel® Trace Analyzer enables you to process trace files without a GUI.

The traceanalyzer command line interface provides the following options:

Option Name Action
--messageprofile
Perform message profile analysis.
--collopprofile
Perform collective operation profile analysis.
--functionprofile
Perform function profile analysis.
--starttime=TICKS or -sTICKS
Starting time of the analysis.
--endtime=TICKS or -eTICKS
Ending time of the analysis.
--tgroup=ID or -tID
Use this thread aggregation.
--fgroup=ID or -fID
Use this function aggregation.
--dump=FILE or -oFILE
The file where to store the analysis results. If not specified, results are printed in standard output.
--funcformat

A string that contains format switchers specifying how the information about functions are printed; the default value is TFNEIS

Possible format options:

  1. f or F - prints the name of the function group

  2. t or T - prints the name of the thread/process group

  3. g or G - prints the number of processes/threads in the group

  4. E - prints self time in ticks

  5. e - prints self time in seconds

  6. I - prints total time in ticks

  7. i - prints total time in seconds

  8. n or N - prints the number of calls

  9. s or S - prints the source code location (if possible)

--messageformat

A string that contains format switchers specifying how the information about point-to-point messages is printed; the default value is 12DdIiXxAauUn.

Possible format options:

1 - prints if the first member of the message is sender and/or receiver

2 - prints if the second member of the message is sender and/or receiver

D - prints the summary duration in ticks

d - prints the summary duration in seconds

v or V - prints the summary amount of bytes sent

k or K - prints the minimum amount of bytes sent

l or L - prints the maximum amount of bytes sent

U - prints the minimum duration in ticks

u - prints the minimum duration in seconds

X - prints the maximum duration in ticks

x - prints the maximum duration in seconds

I - prints the minimum rate in Bytes/tick

i - prints the minimum rate in Bytes/second

A - prints the maximum rate in Bytes/tick

a - prints the maximum rate in Bytes/second

n or N - prints the number of messages

--collopformat

A string that contains format options specifying how the information about collective operations is printed; the default value is 12DdIiXxAauUnvwyzlk.

Possible format options:

1 - prints the name of the process group

2 - prints the name of the operation

D - prints the summary duration in ticks

d - prints the summary duration in seconds

U - prints the minimum duration in ticks

u - prints the minimum duration in seconds

X - prints the maximum duration in ticks

x - prints the maximum duration in seconds

I - prints the minimum rate in Bytes/tick

i - prints the minimum rate in Bytes/second

A - prints the maximum rate in Bytes/tick

a - prints the maximum rate in Bytes/second

v or V - prints the summary amount of bytes sent

k or K - prints the minimum amount of bytes sent

l or L - prints the maximum amount of bytes sent

w or W - prints the summary amount of bytes received

y or Y - prints the minimum amount of bytes received

z or Z - prints the maximum amount of bytes received

n or N - prints the number of collective operations

--readstats or -S
Request statistics, if available, instead of trace data.
--readcache[=FILE] or -r[FILE]
Read the trace cache from the specified (if provided) or default file.
--writecache[=FILE] or -w[FILE]
If a trace cache has been built, write it to the specified (if provided) or default file.
--buildcache=RESOLUTION or -cRESOLUTION
Build a trace cache with the specified resolution. The resolution is given in clock ticks. Higher values result in smaller (coarser) cache files, 0 (zero) is used as the default resolution.
--filter=EXPRESSION or -FEXPRESSION
The filter to use for the analysis, specified as a filter grammar string. EXPRESSION may be: funcfilter, p2pfilter, collfilter or their combinations. For details, see the Filter Expression Grammar section.
--messagefirst=GROUPING
The first grouping in the message profile analysis result (first dimension of matrix).
--messagesecond=GROUPING
The second grouping in the message profile analysis result (second dimension of matrix).
--collopfirst=GROUPING
The first grouping in the collective operation profile analysis result (first dimension of matrix).
--collopsecond=GROUPING
The second grouping in the collective operation profile analysis result (second dimension of matrix).
--summary
Generate the application summary sheet with the format that is described below.
--icpf [options] <tracefile> --simulator <simulator library> 

Process a trace file using the specified simulator at runtime.

Use the traceanalyzer -icpf option to process your trace files using specific simulator library. In --icpf [options] <tracefile> --simulator <simulator libraray>, the [options] can be:

-s<NUM> - processes the trace starting at the time (NUM measured in ticks).

-e<NUM> - processes the trace to the end time (NUM measured in ticks).

-w<NUM> - processes the trace based on NUM, 0 for STF, 1 for ASCII, else devnull.

-o<new_name> - trace output file name.

-u - single file mode. The output file is a single STF.

-h - prints this message and exits. 

--ideal [options] <tracefile>

Produce an ideal trace.

Use the traceanalyzer --ideal option to idealize a trace by Ideal Interconnect Simulator. In --ideal [options] <tracefile>, the [options] can be: 

-s <NUM> - processes the trace starting at the time (NUM measured in ticks; the default value is 0). 

-e<NUM> - processes the trace to the end time (NUM measured in ticks; the default value is the end time of the trace). 

-w<NUM> - processes the trace based on NUM, 0 for STF, 1 for ASCII, else devnull (the default value is 0). 

-o<new_name> - trace output file name. 

-u - single file mode. The output file is a single STF. 

-sp - shows percent progress indicator. 

-q - quiet mode; turns off all output. 

-h - prints this message and exits. 

--breakdowns <real_trace_name> <ideal_trace_name>
Create intermediate *.bdi files that contain all needed information for the Imbalance Diagram.
--merge <unmerged_trace_name> [<merged_trace_name>] [-single] [-delete-raw-data] [-sumdata]

Merge the raw trace.

<merged_trace_name> - if set this option, then the output trace will have this name; otherwise suffix .merged will be added to the original name.

-single - create a single STF file instead of multiple ones

-delete-raw-data - delete the raw trace after merging

-sumdata - create summary data files while merging

--sumdata <trace_name>
Create summary data files from an ordinary trace
--assist [options] <tracefile>

Use the --assist option to discover performance problems in your application. To learn more about the Performance Assistant, refer to the Performance Assistant section.

In --assist [options] <tracefile>[options] can be:

-s <NUM> - processes the trace starting at the time NUM measured in ticks; the default value is 0.

-e <NUM> - processes the trace to the end time NUM measured in ticks; the default value is the end time of the trace

-h - prints this message and exits

--interval=PERCENT or -iPERCENT

Select the time interval in the trace file to be analyzed. PERCENT represents the percent of time taken from the middle of the trace file. This value may range from 0 to 100 (default).

For example, if you set the interval to 20%, and your application time is 10 seconds, only the interval from 4 to 6 seconds will be analyzed.

The application summary sheet consists of a three-line header:

         <# processes>:<# processes per node>
<application time>:<MPI time>:<IIS time>
<first message size of middle bucket (2)>: \
<first message size of highest bucket (3)>

The header is followed by these sets of lines, for each of the top ten  functions, sorted by descending total time:

         <Name of MPI_group>:<# involved processes> 
         
<total time in above func for bucket 1>:<for bucket 2>:<for bucket 3>
<total IIS time in above func for bucket 1>:<for bucket 2>:<for bucket 3>
<count in above func for bucket 1>:<for bucket 2>:<for bucket 3>
<total # bytes in above func for bucket 1>:<for bucket 2>:<for bucket 3>

In the application summary sheet, IIS stands for Ideal Interconnect Simulator, which predicts MPI behavior on an ideal interconnect.

You can import the application summary sheet to spreadsheet applications such as Microsoft* Office Excel*. Fields are separated by colons. Unknown values are indicated by N/A