2010-02-23

Garbage collection considered harmful

I think garbage collection is one of the biggest disservices modern programming languages brought upon us. It is touted as a silver bullet and apparently no new language can live without it. Java, Python, PHP, ActionScript, VisualBasic, C#, Ruby, ...

Why do I think garbage collection is evil?
  • the problem it promises to solve isn't actually solved. You can still easily get memory leaks by building circular references or forgetting to unregister event handlers. However now these leaks are much harder to track down and fix.
  • the problem of memory management, i.e. efficiently using the available memory is made tremendously harder. You give up all control over when or even if your objects get collected. There is no easy way to make use of the problem domain knowledge the programmer has and the garbage collector hasn't. If I know in advance I'll need a lot of memory the next frame it makes sense to clean up ahead of time. The garbage collector can't know this and will be caught off guard, causing performance drops. Also, how much memory is wasted for "zombie objects", that just linger around, waiting to be collected? Trying to fine tune an application's memory profile and reducing it's footprint feels like having to work blindfolded, in a straight jacket with your feet stuck in the mud.
  • as far as I know no one has figured out yet how to combine garbage collection with deterministic destruction. Destructors in garbage collected languages are either non-existent or worthless because you don't get any guarantees when or if they'll execute. This kills one of the most useful programming idioms ever invented: RAII, or Resource Acquisition Is Initialization. You acquire a resource in a class' constructor and release it in the destructor. There is a well defined sequence of events and the destructor is guaranteed to be called, you cannot forget to release the resource. And before you say garbage collection makes alloc/release patterns obsolete think again. There are other resources besides memory that follow the exact same pattern with potentially catastrophic consequences if you forget to release them: file handles, network connections, vertex buffers, audio loops, database connections, database transactions, locking textures, mutexes... I'm sure you can think of more. And this is not even accounting for application specific logic like undo/redo patterns.
  • this is related to the previous point about RAII. Without deterministic destructors writing exception safe code becomes very hard indeed. Instead of having all your classes clean up automatically after themselves you have to manually remember to bracket everything with try/catch/finally clauses. As a direct consequence you'll need absolute information about what code may throw exceptions and when. Knowledge that often has nothing to do with the problem at hand, is easily forgotten and is often buried under layers and layers of code. Hence Java-like crutches of requiring exception specifications for every method.
  • you lose value semantics for everything but the simplest native types. There is a distinct divide between native types like int and objects. While the former are passed around by value the latter can only be passed by reference. This causes lots of confusion and ugly hacks (Java's int vs Integer) and often forces you to write less efficient code.

Contrary to popular belief memory management is not a problem in C++. In fact, in modern C++ you hardly ever allocate memory directly. I can't remember the last time I had to search for a memory leak but I do have to minimize memory usage of my programs constantly. Yes, even in times of multi gigabyte RAM machines you easily exhaust that memory when dealing with lots of concurrency or just large complex problems. Now garbage collection solves a problem I don't have (leaking memory) while making a problem I do have (using too much memory) infinitely harder to solve. And before anyone says I just haven't discovered my leaks yet - some of my stuff has to run 24/7 under heavy load, even small leaks will quickly become apparent in such a situation.

Trusting the garbage collector to solve memory issues for you is like sitting in a burning house, closing your eyes to the problem and repeating to yourself: "Everything's gonna be fine. Everything's gonna be fine." until you go up in flames. Memory leaks aren't the problem, using too much of it is.

Summary: I hate garbage collection and the sooner the world rids itself of that addiction the better.

7 comments:

  1. Amen, brother!

    I have several programs installed at my computer at the moment which completely depend on garbage collection to clean up their mess ... and with the limited resources of my old machine this leaves me swearing at my computer ever so often, wondering when if at all I will get my memory back.

    A program with garbage collection is a bit like the household of a messy. Instead of throwing things out when they are no longer used, they keep the stuff around until the house is more or less filled to the ceiling and then reluctantly give some of it away. It's ... well ... messy :)

    ReplyDelete
  2. Garbage collection does handle circular references.

    ReplyDelete
  3. @jjp: you are correct if you are working with a modern garbage collector and its object reachability tests actually work. Both of which is not necessarily a given. In any case, the point remains that it is still possible to create memory leaks in garbage collected languages. Java: http://www.javaworld.com/javaworld/javatips/jw-javatip79.html or ActionScript: http://blogs.adobe.com/aharui/2007/03/garbage_collection_and_memory.html

    I'm sure one can find more examples.

    ReplyDelete
  4. 1. How is forgetting to unregister an event handler any different from forgetting to manually free your memory?

    2. Using destructors for RAII is only useful for objects not located on the heap. Otherwise you have to exlicitly destruct/free them. Forgetting that is no different from forgetting to close a handle/whatever. I agree RAII is ugly with Java. But other languages have better solutions.

    3. I don't get your last point. How is the value/object (e.g. int/Integer) divide in Java related to garbage collection?

    4. You fail to explain why memory management in C++ is not a problem.
    Or why you don't need to manually allocate memory. How do you manage the lifetime of your objects?

    5. I think there's nobody arguing that garbage collection magically gives you optimal memory management. The point is that it's indefinitely easier than manual memory management. But of course there are situations where (generic) garbage collection just isn't good enough.

    6. Also: The infrastructure needed for garbage collection can offer more benefits than just memory safety. Heap defragmentation is just one example.

    ReplyDelete
  5. Thanks for your comments. I'll address them one by one.

    1) The point is that garbage collection won't help you fix those kinds of leaks. Thus its promise of solving the memory problem for you isn't fulfilled.

    2) I don't agree. The whole idea of RAII is to put resource acquisition in constructors and release them in the destructor. Thus it is absolutely impossible to forget to release them. Heap objects are usually managed by some other (stack based) handle object, i.e. either containers like vector or smart pointers.

    3) Maybe this is a misunderstanding on my part. Let me ask the other way around: why does Java need int and Integer? I thought because ints cannot be managed by the GC.

    4) Just take a look at any modern C++ library. There will be very few direct calls to new/delete, if any. They are mostly wrapped and hidden away, RAII again.

    5) A matter of opinion I guess. For me it makes memory management _much_ harder. That's because memory leaks aren't a problem for me. Using too much memory is - and the garbage collector makes an application's memory profile unpredictable and very hard to fine tune. Also, random performance degrations during run time are a very hefty price to pay for the dubious benefit of maybe making the program easier to write.

    6) While heap defragmentation is a theoretical benefit I don't think it's worth the price you pay. And it may even backfire in a big way: How does a garbage collector prevent false sharing? (http://www.ddj.com/go-parallel/article/showArticle.jhtml;?articleID=217500206) I can see no way it'll be able to do that without causing massive overhead collecting the necessary information. And every compacting step will have to take that into account.
    What about an even simpler question? How can you force the garbage collector to allocate objects in a continuous run of memory for cache coherency? How can you force it to keep it that way and not move the objects around?

    I'm biased, but I absolutely agree with Bjarne Stroustrup's believe that you shouldn't pay for what you don't use. Garbage collection violates this principle in a big way.

    ReplyDelete
  6. 1) I think the biggest point is that the concept provides memory safety. Not necessarily avoiding memory leaks. I'd rather have my system not work because it's running out of memory than having it run someone else's code because of a memory corruption.

    3) I couldn't find a good source. So this excerpt from Wikipedia's Java page will have to do: "As in C++ and some other object-oriented languages, variables of Java's primitive data types are not objects. Values of primitive types are either stored directly in fields (for objects) or on the stack (for methods) rather than on the heap, as commonly true for objects (but see Escape analysis). This was a conscious decision by Java's designers for performance reasons."

    6) At least in theory, memory managed runtime environments (using JIT/runtime profiling) could automatically optimize a program for maximum cache efficiency.

    I agree that garbage collection can be the wrong tool for some applications. And knowing when to avoid it, is absolutely necessary. But I think most applications profit from the comfort of garbage collection and managed memory in general (keyword: memory corruption).

    I haven't looked into this. But maybe there is a language/system that provides memory safety and at the same time supports manual object destruction (throws an exception if there are still references to that objects and a delete is attempted). I'd prefer something like that to C++ any time.

    ReplyDelete
  7. Sorry for the moderating delay - we've been acquired by google and I have been a bit distracted the last couple of days ;-)

    1) Use-after-frees (dangling pointers) and using uninitialized heap memory will be prevented - agreed. But I think the most common memory error leading to code execution are buffer overruns. Garbage collection won't help much against those or will it? Anyways, I agree with you, safety is a valid argument for the use of GC.

    3) But aren't the "performance reasons" you quote caused by GC? I mean the GC only deals with heap objects which are usually less efficient than stack objects. Hence the Java compromise.

    6) In theory they could - maybe. But that doesn't help if you need to optimize for that right now. Also, I believe they'd need quite a bit of understanding about the code flow which would cause even more complex, brittle and expensive systems.

    I fully agree with your second but last paragraph and that's what caused my rant in the first place. GC is unsuitable for some applications and problem domains. Hence I rile against making it mandatory in most modern languages. I much prefer the opt-in philosophy of "don't pay for what you don't need" of C++. Note that there are garbage collectors for C++!

    I think D is a language that tries to combine the best of both worlds. I don't have any first hand experience with it though.

    Thank you very much for an interesting discussion. I know my post was a bit inflamatory/polemic. I much appreciate your level headed retorts. Care to drop the "Anonymous" handle? ;-)

    cheers,

    Sören

    ReplyDelete