pplot [*MODE*] [*OPTIONS_FOR_MODE*]

The benchmark-plot tool is a program that generates various types of plots for given data that is generated by the benchmark-run tool. Supported types include bar plots, scatter plots, tables, and speedup plots. Various temporary files used to generate plots and tables can be found in the folder `_results/`

.

The mode specifier *MODE* is optional: if not present, the default plot to be generated is the bar plot. Otherwise, *MODE* can be one of the following: `bar`

, `scatter`

, `table`

, or `speedup`

.

`-width`

*n*- Specify width
*n*of the plot to be generated (in pixels). `-height`

*n*- Specify height
*n*of the plot to be generated (in pixels). `-title`

*t*- Specify the title string
*t*to be displayed in the plot. `-input`

*filename*- Specify a file
*filename*where to read results (default is`results.txt`

). `-output`

*filename*- Specify a file
*filename*where to plot results.

`bar`

and `scatter`

`-legend-pos [topright|topleft|...]`

- Select an R code for the legend (see R codes for legend or file
`legend.ml`

). `-chart`

*k1*,*k2*,…- Select a combination of keys by which to divide program runs among charts.
`-series`

*k1*,*k2*,…- Select a combination of keys by which to divide program runs among series.
`-group-by`

*k1*,*k2*,…- Select a combination of keys by which to divide program runs among groups.
`-y`

*x*- Select the key to plot on the y axis.
`-ylabel`

*lab*- Label the y axis as
*lab*. `-ymin`

*n*- Set the origin of the y axis to
*n*. `-ymax`

*n*- Set the maximum value of the y axis to
*n*. `--yzero`

- Set the origin of the y axis to zero.
`--ylog`

- Use a log scale for the y axis.

`scatter`

only`-xlabel`

*lab*- Label the x axis as
*lab*. `-xmin`

*n*- Set the origin of the x axis to
*n*. `-xmax`

*n*- Set the maximum value of the x axis to
*n*. `--xzero`

- Set the origin of the x axis to zero.
`-drawline`

*n*- Draw a horizontal line originating from
*n*on the y axis.

`bar`

only:`--xtitles-vertical`

- Use a vertical orientation for the label of the x axis.

or:

`-xtitles-dir [horizontal|vertical]`

- Set the orientation to use for the labels under the bars.

`table`

only`-table`

*k1*,*k2*,…- Select a combination of keys by which to divide program runs among tables.
`-row`

*k1*,*k2*,…- Select a combination of keys by which to divide program runs among rows of the table.
`-col`

*k1*,*k2*,…- Select a combination of keys by which to divide program runs among columns of the table.
`-cell`

*k*- Select the key whose corresponding value is to be displayed in the cells of the table.
`-group-by`

*k1*,*k2*,…- Select a combination of keys by which to divide program runs among groups of columns in the table.

`speedup`

only`-chart`

*k1*,*k2*,…- Select a combination of keys by which to divide the program runs among speedup charts.
`-series`

*k1*,*k2*,…- Select a combination of keys by which to divide the program runs among series.
`-group-by`

*k1*,*k2*,…- Select a combination of keys by which to divide the program runs among groups.
`-legend-pos [topright|topleft|...]`

- Specify the position of the legend in the plot (see R codes for legend or file
`legend.ml`

). `--log`

- Use a log scale for the speedup curves.
`--factored`

- Generate a “factored” speedup plot (see instructions below).

`mode`

pplot -mode [*MODE*] [*OPTIONS_FOR_MODE*]

The following commands build the benchmarking tools and then gather some data from a few runs of our example program, Fibonacci.

```
make -C examples/basic fib
make prun
prun -prog examples/basic/fib -algo recursive,cached -n 39,40 -runs 2
make pplot
```

The first plot we generate is a simple bar plot in which bars are grouped together by arguments of `n`

and apart by arguments o `algo`

. The y axis represents run time.

`pplot -x n -y exectime -series algo `

Another way to make the same bar plot is the following.

`pplot bar -x n -y exectime -series algo `

This command generates two bar plots: one for each value of `algo`

, namely `recursive`

and `cached`

.

`pplot bar -x n -y exectime -chart algo `

In this case, bars are separated by arguments of both `n`

an `algo`

.

`pplot bar -x n,algo -y exectime --xtitles-vertical`

We generate a scatter plot for the same data set as follows. Each algorithm in the plot is represented by a labeled curve.

`pplot scatter -x n -y exectime -series algo `

We can generate a table showing breakdowns of run times. The columns are represented by the values of `algo`

and the rows by values of `n`

.

`pplot table -row n -col algo -cell exectime`

A slight change gives a similar result, but one with multiple tables such that each table represents a different value of `algo`

.

`pplot table -row n -table algo -cell exectime`

Another slight changes gives a similar table where rows are represented by combinations of values of both `n`

and `algo`

.

`pplot table -row n,algo -cell exectime `

The speedup curve for a series of runs of a given parallel program is calculated by

\[\textrm{speedup}(P) = \frac{T_S}{T_P}\]

where \(P\) denotes the number of processors used by the program, \(T_S\) the time taken by the sequential baseline program, and \(T_P\) the the taken by the parallel program using \(P\) processors.

In order to generate a speedup plot, we need to follow a few simple steps. First, we need to ensure that the programs that we wish to benchmark print to `stdout`

the running time in the format

`exectime`

*t*

where *t* is a floating-point number that expresses wall-clock time (in seconds). For example, one such running-time report is `exectime 4.32`

, which reports that running time was 4.32 seconds. Second, we need to perform some runs of the programs that we want to benchmark, collecting the data as we go. The data that we need to collect is generated automatically by the “speedup” mode of the `prun`

tool. Third, after the benchmarking runs complete, we need to run the `pplot`

command to generate the speedup plot.

Let us consider the following example, where we are going to compare the speedup curves resulting two competing algorithms. In our sample benchmark program, the algorithm to be measured is selected by the command-line key `-algo`

. Our baseline algorithm is specified by the value `foo`

and our parallel algorithm by the value `bar`

.

```
make prun
prun speedup -baseline "examples/others/speedup.sh -algo foo" -baseline-runs 1 -parallel "examples/others/speedup.sh -algo bar -proc 1,2,3,4" -runs 2
pplot speedup
```

We can take multiple speedup curves as well. The following example extends our previous example by introducing a new parameter, namely `n`

, and two values for `n`

: 4 and 5. The plot generated for this example will have two speedup curves, one for each setting of `n`

.

```
prun speedup -baseline "examples/others/speedup.sh -algo foo" -baseline-runs 1 -parallel "examples/others/speedup.sh -algo bar -proc 1,2,3,4" -runs 2 -n 1,5
pplot speedup -series n
```

Here is another example, this time using our parallel Fibonacci benchmark.

`prun speedup -baseline "bench.baseline" -parallel "bench.opt -proc 1,2" -bench fib -n 39`

A “factored speedup plot” is a speedup plot that shows, in addition to an actual speedup curve of a given benchmark program, three or more “synthetic” speedup curves for the same benchmark program. Each synthetic speedup curve is just a curve that shows how the actual speedup curve might look, provided that overheads related to a given source, such as scheduling overhead, could be disregarded.

Our plotting tool currently generates three synthetic speedup curves by default and one extra synthetic curve, optionally. The first synthetic curve is the ** maximal speedup curve**. This curve is defined by the function

\[\textrm{maximal}(P) = \frac{P \cdot T_S}{T_1}\]

where \(P\) denotes the number of processors, \(T_1\) the running time of the parallel program on a single processor, and \(T_S\) the running time of the sequential baseline program. The maximal speedup curve gives a realistic upper bound on the parallel speedup by taking into account any additional work that must be performed by the parallel run compared to the sequential run.

The second synthetic curve is the ** idle-time-specific curve**. This curve is defined by the function

\[\textrm{idletime}(P) = \frac{P \cdot T_S}{T_1 + I_P}\]

where \(I_P\) denotes the time spent by a processor when the processor is idling and waiting for work, combined across all processors. The `pplot`

tool expects for the value of \(I_P\) to be reported by the benchmark program to `stdout`

in the form: `total_idle_time`

*t*, where *t* denotes the total idle time in seconds. This curve represents the speedup curve that we might expect to see if we consider only the performance of the parallel program on a single processor and the amount of idle time as the number of processors increase.

The third synthetic curve is the ** inflation-specific curve**. This curve is defined by the function

\[\textrm{inflation}(P) = \frac{P \cdot T_S}{T_1 + F_P}\]

where \(F_P\) denotes the “work inflation”. The work inflation of a parallel computation is the amount of time by which the computation is slowed down due to effects that are not related to processor utilization. The most significant of these effects are the increased costs relating to memory accesses in a parallel computation where multiple processors are sharing the same memory controllers. Because the work inflation term \(F_P\) is not readily measured in a direct fashion, we derive the inflation-specific speedup from \((P \cdot T_S) / (P \cdot T_P - I_P)\).

```
prun speedup -baseline "examples/others/speedup.sh -algo foo" -baseline-runs 1 -parallel "examples/others/speedup.sh -algo bar -proc 1,2,3,4" -runs 2
pplot speedup --factored
```

The factored speedup mode can optionally generate one more curve: the ** elision-specific curve**. This curve is calculated by

\[\textrm{elision}(P) = \frac{P \cdot T_S}{T_E}\]

where \(T_E\) denotes the running time of the “sequential elision” of the parallel benchmark. The sequential elision is the program that is derived by substituting all parallel function calls in the program by sequential function calls. What remains in the elision program is just the parallel algorithm, minus the overheads imposed by parallel scheduling.

```
prun speedup -baseline "examples/others/speedup.sh -algo foo" -baseline-runs 1 -elision "examples/others/speedup.sh -algo elision_of_bar" -elision-runs 1 -parallel "examples/others/speedup.sh -algo bar -proc 1,2,3,4" -runs 2
pplot speedup --factored
```