Wednesday 25 October 2017

x86 - How to measure late prefetches and killed prefetches on Haswell micro-architecture?

I am using Intel Xeon 2660 v3
and issuing lots of software prefetches to exploit the MLP as well as to reduce the
stall time. Now I want to profile the application to get the overall gain due to
software prefetches.



In the paper
"Improving the Effectiveness of Software Prefetching with Adaptive
Execution
", the authors have discussed the performance counter support in
the hardware related to software prefetching.



I
am putting the text from the paper, where the authors talked about the performance
counters.





Furthermore, the only hardware support required by the
best adaptive
scheme is a pair of counters: one measuring the

number of late
prefetches (the ones arriving after the processor
has requested the data) and
another one measuring the number of
prefetches killed as a result of cache
conflicts.




I want
to profile the application for Haswell microarchitecture
but couldn't find any such performance counter in Perf or
PAPI. So, are there any other performance counters to get
such events and what is the best possible way to do it for the small part of the code
instead of doing it for the full
application?



href="http://ieeexplore.ieee.org/iel3/4076/11986/00552556.pdf" rel="nofollow
noreferrer">Paper Link

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...