The purpose of using using higher level programming languages is to be able to use higher level of abstraction in order to make code more readable and reusable. However, added levels of abstraction usually results in loss of efficiency (speed, memory or both).
This posting is a small case study on three different programming languages (OCaml, Scala and C++) comparing this hit on the running time as the level of abstraction and genericity increases. To highlight the overhead I chose a task which is CPU bound an where the relative overhead of abstraction (when introduced) is as high as possible. So this example can be viewed as a kind of worst case scenario.
The results can be read here
Sunday, May 31, 2009
Subscribe to:
Post Comments (Atom)
15 comments:
I think that in small examples like this you are probably correct. But my experience in building large systems is that the price of a lack of abstraction is usually a loss in efficiency, and an increase in bugs.
The performance issues associated with heap allocation of small tuple objects (e.g. complex numbers, small vectors, ...) are a big deal. I've been bitten by this in real world projects (computational physics) several times. The high cost of heap allocation generally forces me into writing messy mutable code, which in turn leads to annoying bugs. (The third example, involving reflection, is less relevant to me.)
I love Scala, and enjoy using it for certain tasks, but I find it disingenuous when complex numbers are given as an example of Scala's scalability without mention of performance issues. A 3x performance penalty is a big deal in numerical work (I often run my code for days or weeks...).
On the mailing list it was suggested that the latest JVM with -server may resolve some of this using escape analysis. I hope so!
Channing: I agree with the increase of bugs, but I even in C++, I have seen cases where abstraction was a source of significant performance loss. I never really saw performance improvement due to a higher level of abstraction, However, improved stability, readability, reusability, etc... are more often than not worth paying the price.
Kipton: Yes I also think it's a bit arrogant to brag about scalability before actually solving important aspects of the problem. C++ is a real mess, but at least in an optimized way ;)
I really think a better Scala analogy for the third implementation would be
trait XY {def x:Double; def y:Double}
final case class Complex2(x:Double,y:Double) extends XY {
def norm_square = x*x + y*y
def +(other:XY) = new Complex2(x+other.x,y+other.y)
def *(other:XY) = new Complex2(x*other.x-y*other.y,
x*other.y+y*other.x)
}
given that you mix in XY trait everywhere you need.
It would be very interesting to do this test with the new jdk 1.7 and the parameters "-server -XX:+AggressiveOpts".
While the jdk 1.6 uses escape analysis, afaik it does not do stack allocation. It only uses the escape analysis results to remove locks.
I had some quite impressive performance improvements with jdk 1.7 and the above mentioned parameters.
I'm sorry to say that probably Your "benchmark" is useless. You didn't even write how You ran these codes. Probably You didn't use -server for Scala for example, and didn't give time to JVM to warm up.
I experimented with a few other encodings of the Scala abstraction.
Using a trait, as alexey_rom suggested, ran in comparable time to the class (#2) version. But of course traits only work if you can mix them in to each class.
Other options include using an implicit parameter supplying accessors to the abstracted 'x' and 'y' methods, and using an implicit parameter that wraps arguments in a trait with 'x' and 'y' methods.
The accessor approach is again roughly the same performance as the class abstraction, but much uglier. The wrapper is cleaner syntactically, but significantly slower.
Details here:
http://paste.pocoo.org/show/120578/
You've written OCaml code which no SML programmer or OCaml programmer would ever write. You've effectively ignored OCaml, copy and pasted your C++ code and then expected it to run exactly the same.
Look at the Complex module and you'll see how OCaml people do it.
posts like this are why I don't read programming.reddit.com anymore.
Anonymous:
The Complex module in OCaml is implemented exactly the same way as the second solution of the test and its runtime is the same as well.
I agree that neither C++, nor OCaml or Scala programmers would use the third solution for this specific problem, as it is an overkill. The sole purpose of this test was is study the added overhead of different abstraction mechanisms (e.g. OCaml's object systems vs. records, or generics vs. interfaces)
dmilith:
I used JAVA_OPTS="-server"
The startup time for this example was mostly negligible: running the test 10 times within the same run and averaging the results gave the same result in a very small margin of error.
I think you and your anonymous friend could keep the discussion a bit more factual if you could support your statements by numbers. ;)
alexy:
Using trait would have defated the purpose of the 3rd test. Of course it would be roughly equivalent to the 2nd solution, so I did not bother testing it.
There are cases where you would be tempted to use structural equivalence to be generic (e.g. when you can't add interface to the class, as it is in an external library, etc.)
The purpose of the third test was to highlight the cost of this powerful abstraction mechanism (albeit it does not make much sense in this particular setting)
On the scala list Jorge Ortiz pointed out an equally generic solution using implicit conversions. His method for the 3rd variant resulted in 3X speedup. (Still having a 2X runtime penalty compared to the 2nd solution)
Just by this exmaple I learned an interesting lesson: Constructing a new proxy object is often more efficient than calling methods using reflection.
"Abstraction" means two things:
1. you've described your solution to the problem in higher-level terms, presumably closer to the specific problem domain
2. you are relying on some tool/library to bridge the gap between your description and the execution domain.
#1 is why people like higher-level languages: easier to write.
The potential big win is if #2 is designed to address your specific problem, because then it can recognize things about your problem description that can allow algorithmic shortcuts that can provide potentially enormous gains over naive low-level solutions.
This happens every time someone uses an optimizing compiler, but is a more general principle.
In addition to "class", C# has "struct" which is passed by value and allocated on the stack. For things like complex numbers, this avoids slow-down due to the heap allocation one experiences in Java and Scala, while still providing a reasonably high level language (gc, lambdas, iterators, etc.). Other .NET languages like Cobra and Visual Basic can use these as well.
It might also be fun to try mandelbrot in a really high level language like Python or Ruby and see how bad performance can get. :-)
I ran mandel.cpp and mandel.ml as given in your webpage.
I got :
ocaml : 1.8s
cpp : 2.1s
so how did you get that ? did you really used ocaml native code ?
Actually I runned again the three tests. It gives very different results :
c++ : 2.1s 10.0s 10.1s
ocaml: 1.8s 2.3s 28.1s
Post a Comment