Friday 8 June 2018

java - Why is if (variable1 % variable2 == 0) inefficient?



I am new to java, and was running some code last night, and this really bothered me. I was building a simple program to display every X outputs in a for loop, and I noticed a MASSIVE decrease in performance, when I used modulus as variable % variable vs variable % 5000 or whatnot. Can someone explain to me why this is and what is causing it? So I can be better...




Here is the "efficient" code (sorry if I get a bit of syntax wrong I'm not on the computer with the code right now)



long startNum = 0;
long stopNum = 1000000000L;

for (long i = startNum; i <= stopNum; i++){
if (i % 50000 == 0) {
System.out.println(i);
}

}


Here is the "inefficient code"



long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;

for (long i = startNum; i <= stopNum; i++){

if (i % progressCheck == 0) {
System.out.println(i);
}
}


Mind you I had a date variable to measure the differences, and once it became long enough, the first one took 50ms while the other took 12 seconds or something like that. You may have to increase the stopNum or decrease the progressCheck if your PC is more efficient than mine or what not.



I looked for this question around the web, but I can't find an answer, maybe I'm just not asking it right.




EDIT:
I did not expect my question to be so popular, I appreciate all the answers. I did perform a benchmark on each half in time taken, and the inefficient code took considerably longer, 1/4 of a second vs 10 seconds give or take. Granted they are using println, but they are both doing the same amount, so I wouldn't imagine that would skew it much, especially since the discrepancy is repeatable. As for the answers, since I am new to Java, I will let votes decide for now which answer is best. I will try to pick one by Wednesday.



EDIT2:
I am going to make another test tonight, where instead of modulus, it just increments a variable, and when it reaches progressCheck, it will perform one, and then reset that variable to 0. for a 3rd option.



EDIT3.5:



I used this code, and below I'll show my results.. Thank you ALL for the wonderful help! I also tried comparing the short value of the long to 0, so all my new checks happen ever "65536" times making it equal in repeats.




public class Main {


public static void main(String[] args) {

long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 65536;
final long finalProgressCheck = 50000;
long date;


// using a fixed value
date = System.currentTimeMillis();
for (long i = startNum; i <= stopNum; i++) {
if (i % 65536 == 0) {
System.out.println(i);
}
}
long final1 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();

//using a variable
for (long i = startNum; i <= stopNum; i++) {
if (i % progressCheck == 0) {
System.out.println(i);
}
}
long final2 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();

// using a final declared variable

for (long i = startNum; i <= stopNum; i++) {
if (i % finalProgressCheck == 0) {
System.out.println(i);
}
}
long final3 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using increments to determine progressCheck
int increment = 0;
for (long i = startNum; i <= stopNum; i++) {

if (increment == 65536) {
System.out.println(i);
increment = 0;
}
increment++;

}

//using a short conversion
long final4 = System.currentTimeMillis() - date;

date = System.currentTimeMillis();
for (long i = startNum; i <= stopNum; i++) {
if ((short)i == 0) {
System.out.println(i);
}
}
long final5 = System.currentTimeMillis() - date;

System.out.println(
"\nfixed = " + final1 + " ms " + "\nvariable = " + final2 + " ms " + "\nfinal variable = " + final3 + " ms " + "\nincrement = " + final4 + " ms" + "\nShort Conversion = " + final5 + " ms");

}
}


Results:




  • fixed = 874 ms (normally around 1000ms, but faster due to it being a power of 2)

  • variable = 8590 ms

  • final variable = 1944 ms (Was ~1000ms when using 50000)


  • increment = 1904 ms

  • Short Conversion = 679 ms



Not surprising enough, due to a lack of division, the Short Conversion was 23% faster than the "fast" way. This is interesting to note. If you need to show or compare something every 256 times (or about there) you can do this, and use



if ((byte)integer == 0) {'Perform progress check code here'}


ONE FINAL INTERESTING NOTE, using modulus on the "Final declared Variable" with 65536 (not a pretty number) was half the speed of (slower) than the fixed value. Where before it was benchmarking near the same speed.



Answer



You are measuring the OSR (on-stack replacement) stub.



OSR stub is a special version of compiled method intended specifically for transferring execution from interpreted mode to compiled code while the method is running.



OSR stubs are not as optimized as regular methods, because they need a frame layout compatible with interpreted frame. I showed this already in the following answers: 1, 2, 3.



A similar thing happens here, too. While "inefficient code" is running a long loop, the method is compiled specially for the on-stack replacement right inside the loop. The state is transferred from the interpreted frame to OSR-compiled method, and this state includes progressCheck local variable. At this point JIT cannot replace the variable with the constant, and thus cannot apply certain optimizations like strength reduction.



In particular this means JIT does not replace integer division with multiplication. (See Why does GCC use multiplication by a strange number in implementing integer division? for the asm trick from an ahead-of-time compiler, when the value is a compile-time constant after inlining / constant-propagation, if those optimizations are enabled. An integer literal right in the % expression also gets optimized by gcc -O0, similar to here where it's optimized by the JITer even in an OSR stub.)




However, if you run the same method several times, the second and the subsequent runs will execute the regular (non-OSR) code, which is fully optimized. Here is a benchmark to prove the theory (benchmarked using JMH):



@State(Scope.Benchmark)
public class Div {

@Benchmark
public void divConst(Blackhole blackhole) {
long startNum = 0;
long stopNum = 100000000L;


for (long i = startNum; i <= stopNum; i++) {
if (i % 50000 == 0) {
blackhole.consume(i);
}
}
}

@Benchmark
public void divVar(Blackhole blackhole) {

long startNum = 0;
long stopNum = 100000000L;
long progressCheck = 50000;

for (long i = startNum; i <= stopNum; i++) {
if (i % progressCheck == 0) {
blackhole.consume(i);
}
}
}

}


And the results:



# Benchmark: bench.Div.divConst

# Run progress: 0,00% complete, ETA 00:00:16
# Fork: 1 of 1
# Warmup Iteration 1: 126,967 ms/op

# Warmup Iteration 2: 105,660 ms/op
# Warmup Iteration 3: 106,205 ms/op
Iteration 1: 105,620 ms/op
Iteration 2: 105,789 ms/op
Iteration 3: 105,915 ms/op
Iteration 4: 105,629 ms/op
Iteration 5: 105,632 ms/op


# Benchmark: bench.Div.divVar


# Run progress: 50,00% complete, ETA 00:00:09
# Fork: 1 of 1
# Warmup Iteration 1: 844,708 ms/op <-- much slower!
# Warmup Iteration 2: 105,893 ms/op <-- as fast as divConst
# Warmup Iteration 3: 105,601 ms/op
Iteration 1: 105,570 ms/op
Iteration 2: 105,475 ms/op
Iteration 3: 105,702 ms/op
Iteration 4: 105,535 ms/op

Iteration 5: 105,766 ms/op


The very first iteration of divVar is indeed much slower, because of inefficiently compiled OSR stub. But as soon as the method reruns from the beginning, the new unconstrained version is executed which leverages all the available compiler optimizations.


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...