pplot [MODE] [OPTIONS_FOR_MODE]
The benchmark-plot tool is a program that generates various types of plots for given data that is generated by the benchmark-run tool. Supported types include bar plots, scatter plots, tables, and speedup plots. Various temporary files used to generate plots and tables can be found in the folder _results/
.
The mode specifier MODE is optional: if not present, the default plot to be generated is the bar plot. Otherwise, MODE can be one of the following: bar
, scatter
, table
, or speedup
.
-width
n-height
n-title
t-input
filenameresults.txt
).
-output
filenamebar
and scatter
-legend-pos [topright|topleft|...]
legend.ml
).
-chart
k1,k2,…-series
k1,k2,…-group-by
k1,k2,…-y
x-ylabel
lab-ymin
n-ymax
n--yzero
--ylog
scatter
only-xlabel
lab-xmin
n-xmax
n--xzero
-drawline
nbar
only:--xtitles-vertical
or:
-xtitles-dir [horizontal|vertical]
table
only-table
k1,k2,…-row
k1,k2,…-col
k1,k2,…-cell
k-group-by
k1,k2,…speedup
only-chart
k1,k2,…-series
k1,k2,…-group-by
k1,k2,…-legend-pos [topright|topleft|...]
legend.ml
).
--log
--factored
mode
pplot -mode [MODE] [OPTIONS_FOR_MODE]
The following commands build the benchmarking tools and then gather some data from a few runs of our example program, Fibonacci.
make -C examples/basic fib
make prun
prun -prog examples/basic/fib -algo recursive,cached -n 39,40 -runs 2
make pplot
The first plot we generate is a simple bar plot in which bars are grouped together by arguments of n
and apart by arguments o algo
. The y axis represents run time.
pplot -x n -y exectime -series algo
Another way to make the same bar plot is the following.
pplot bar -x n -y exectime -series algo
This command generates two bar plots: one for each value of algo
, namely recursive
and cached
.
pplot bar -x n -y exectime -chart algo
In this case, bars are separated by arguments of both n
an algo
.
pplot bar -x n,algo -y exectime --xtitles-vertical
We generate a scatter plot for the same data set as follows. Each algorithm in the plot is represented by a labeled curve.
pplot scatter -x n -y exectime -series algo
We can generate a table showing breakdowns of run times. The columns are represented by the values of algo
and the rows by values of n
.
pplot table -row n -col algo -cell exectime
A slight change gives a similar result, but one with multiple tables such that each table represents a different value of algo
.
pplot table -row n -table algo -cell exectime
Another slight changes gives a similar table where rows are represented by combinations of values of both n
and algo
.
pplot table -row n,algo -cell exectime
The speedup curve for a series of runs of a given parallel program is calculated by
\[\textrm{speedup}(P) = \frac{T_S}{T_P}\]
where \(P\) denotes the number of processors used by the program, \(T_S\) the time taken by the sequential baseline program, and \(T_P\) the the taken by the parallel program using \(P\) processors.
In order to generate a speedup plot, we need to follow a few simple steps. First, we need to ensure that the programs that we wish to benchmark print to stdout
the running time in the format
exectime
t
where t is a floating-point number that expresses wall-clock time (in seconds). For example, one such running-time report is exectime 4.32
, which reports that running time was 4.32 seconds. Second, we need to perform some runs of the programs that we want to benchmark, collecting the data as we go. The data that we need to collect is generated automatically by the “speedup” mode of the prun
tool. Third, after the benchmarking runs complete, we need to run the pplot
command to generate the speedup plot.
Let us consider the following example, where we are going to compare the speedup curves resulting two competing algorithms. In our sample benchmark program, the algorithm to be measured is selected by the command-line key -algo
. Our baseline algorithm is specified by the value foo
and our parallel algorithm by the value bar
.
make prun
prun speedup -baseline "examples/others/speedup.sh -algo foo" -baseline-runs 1 -parallel "examples/others/speedup.sh -algo bar -proc 1,2,3,4" -runs 2
pplot speedup
We can take multiple speedup curves as well. The following example extends our previous example by introducing a new parameter, namely n
, and two values for n
: 4 and 5. The plot generated for this example will have two speedup curves, one for each setting of n
.
prun speedup -baseline "examples/others/speedup.sh -algo foo" -baseline-runs 1 -parallel "examples/others/speedup.sh -algo bar -proc 1,2,3,4" -runs 2 -n 1,5
pplot speedup -series n
Here is another example, this time using our parallel Fibonacci benchmark.
prun speedup -baseline "bench.baseline" -parallel "bench.opt -proc 1,2" -bench fib -n 39
A “factored speedup plot” is a speedup plot that shows, in addition to an actual speedup curve of a given benchmark program, three or more “synthetic” speedup curves for the same benchmark program. Each synthetic speedup curve is just a curve that shows how the actual speedup curve might look, provided that overheads related to a given source, such as scheduling overhead, could be disregarded.
Our plotting tool currently generates three synthetic speedup curves by default and one extra synthetic curve, optionally. The first synthetic curve is the maximal speedup curve. This curve is defined by the function
\[\textrm{maximal}(P) = \frac{P \cdot T_S}{T_1}\]
where \(P\) denotes the number of processors, \(T_1\) the running time of the parallel program on a single processor, and \(T_S\) the running time of the sequential baseline program. The maximal speedup curve gives a realistic upper bound on the parallel speedup by taking into account any additional work that must be performed by the parallel run compared to the sequential run.
The second synthetic curve is the idle-time-specific curve. This curve is defined by the function
\[\textrm{idletime}(P) = \frac{P \cdot T_S}{T_1 + I_P}\]
where \(I_P\) denotes the time spent by a processor when the processor is idling and waiting for work, combined across all processors. The pplot
tool expects for the value of \(I_P\) to be reported by the benchmark program to stdout
in the form: total_idle_time
t, where t denotes the total idle time in seconds. This curve represents the speedup curve that we might expect to see if we consider only the performance of the parallel program on a single processor and the amount of idle time as the number of processors increase.
The third synthetic curve is the inflation-specific curve. This curve is defined by the function
\[\textrm{inflation}(P) = \frac{P \cdot T_S}{T_1 + F_P}\]
where \(F_P\) denotes the “work inflation”. The work inflation of a parallel computation is the amount of time by which the computation is slowed down due to effects that are not related to processor utilization. The most significant of these effects are the increased costs relating to memory accesses in a parallel computation where multiple processors are sharing the same memory controllers. Because the work inflation term \(F_P\) is not readily measured in a direct fashion, we derive the inflation-specific speedup from \((P \cdot T_S) / (P \cdot T_P - I_P)\).
prun speedup -baseline "examples/others/speedup.sh -algo foo" -baseline-runs 1 -parallel "examples/others/speedup.sh -algo bar -proc 1,2,3,4" -runs 2
pplot speedup --factored
The factored speedup mode can optionally generate one more curve: the elision-specific curve. This curve is calculated by
\[\textrm{elision}(P) = \frac{P \cdot T_S}{T_E}\]
where \(T_E\) denotes the running time of the “sequential elision” of the parallel benchmark. The sequential elision is the program that is derived by substituting all parallel function calls in the program by sequential function calls. What remains in the elision program is just the parallel algorithm, minus the overheads imposed by parallel scheduling.
prun speedup -baseline "examples/others/speedup.sh -algo foo" -baseline-runs 1 -elision "examples/others/speedup.sh -algo elision_of_bar" -elision-runs 1 -parallel "examples/others/speedup.sh -algo bar -proc 1,2,3,4" -runs 2
pplot speedup --factored