I am using Intel Xeon 2660 v3 and issuing lots of software prefetches to exploit the MLP as well as to reduce the stall time. Now I want to profile the application to get the overall gain due to software prefetches.
In the paper "Improving the Effectiveness of Software Prefetching with Adaptive Execution", the authors have discussed the performance counter support in the hardware related to software prefetching.
I am putting the text from the paper, where the authors talked about the performance counters.
Furthermore, the only hardware support required by the
best adaptive scheme is a pair of counters: one measuring the
number of late prefetches (the ones arriving after the processor
has requested the data) and another one measuring the number of
prefetches killed as a result of cache conflicts.
I want to profile the application for Haswell microarchitecture but couldn't find any such performance counter in Perf or PAPI. So, are there any other performance counters to get such events and what is the best possible way to do it for the small part of the code instead of doing it for the full application?
 
No comments:
Post a Comment