Alen Stojanov
2014-03-06 00:55:06 UTC
Dear Linux Perf Users Community,
I noticed some inconsistencies with the perf tool. I would like to
determine whether I am doing something wrong, or whether there are
problem in the perf tool. Here is the problem:
I would like to obtain flops on a simple matrix-to-matrix multiplication
algorithm. The code is available in the attachment as mmmtest.c. To
obtain flops, I run the perf tool using raw counters. When I try to
obtain flops for matrices having sizes bellow 150x150, I obtain accurate
results. Example (anticipated flops: 100 * 100 * 100 * 2 = 2'000'000):
perf stat -e r538010 ./mmmtest 100
Performance counter stats for './mmmtest 100':
2,078,775 r538010
0.003889544 seconds time elapsed
However, whenever I try to run matrices of bigger size, the reported
flops are not even close to the flops that I am supposed to obtain
(anticipated results: 600 * 600 * 600 * 2 = 432'000'000):
perf stat -e r538010 ./mmmtest 600
Performance counter stats for './mmmtest 600':
2,348,148,851 r538010
0.955511968 seconds time elapsed
To give you more info to replicate the problem, I provide you with the
following:
CPU: Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz, 8 cores
Linux Kernel: 3.11.0-12-generic
GCC Version: gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu8)
Monitored events: FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE - Raw event:
0x538010 (converted using libpfm4)
I have compiled the mmmtest.c using gcc -O3 -march=corei7-avx -o mmmtest
mmmtest.c. You can also find mmmtest.s asm version in the attachment.
Do you know why does this happens ? How can I instruct perf to obtain
accurate results ?
Greetings,
Alen
I noticed some inconsistencies with the perf tool. I would like to
determine whether I am doing something wrong, or whether there are
problem in the perf tool. Here is the problem:
I would like to obtain flops on a simple matrix-to-matrix multiplication
algorithm. The code is available in the attachment as mmmtest.c. To
obtain flops, I run the perf tool using raw counters. When I try to
obtain flops for matrices having sizes bellow 150x150, I obtain accurate
results. Example (anticipated flops: 100 * 100 * 100 * 2 = 2'000'000):
perf stat -e r538010 ./mmmtest 100
Performance counter stats for './mmmtest 100':
2,078,775 r538010
0.003889544 seconds time elapsed
However, whenever I try to run matrices of bigger size, the reported
flops are not even close to the flops that I am supposed to obtain
(anticipated results: 600 * 600 * 600 * 2 = 432'000'000):
perf stat -e r538010 ./mmmtest 600
Performance counter stats for './mmmtest 600':
2,348,148,851 r538010
0.955511968 seconds time elapsed
To give you more info to replicate the problem, I provide you with the
following:
CPU: Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz, 8 cores
Linux Kernel: 3.11.0-12-generic
GCC Version: gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu8)
Monitored events: FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE - Raw event:
0x538010 (converted using libpfm4)
I have compiled the mmmtest.c using gcc -O3 -march=corei7-avx -o mmmtest
mmmtest.c. You can also find mmmtest.s asm version in the attachment.
Do you know why does this happens ? How can I instruct perf to obtain
accurate results ?
Greetings,
Alen