Gregory Hildstrom Projects Publications Resume Contact About Youtube Donate

Manual Memory Management vs Automatic Garbage Collection

Introduction
Results
Conclusions
testgc-malloc.c
testgc-glib.c
testgc.cpp
testgc.java
testgc.go
testgc-free.go
testgc.js
testgc-gc.js
testgc.php
testgc-gc.php
testgc.py
testgc-gc.py
testgc.rb
testgc-gc.rb
capture.sh
Makefile

Introduction

My low-level
programming language performance comparison left me wondering how well the automatic garbage collection of some languages really performs. I have run into a few long-lived Java processes that never seemed to return memory back to the operating system. Their resident memory usage only ever seemed to increase with the size of the workload and never really decrease. I have also seen many long-lived C processes with high peak resident memory usage and low idle resident memory usage. Those processes definitely returned memory back to the OS when they were finished with it and I thought that was one of the goals of automatic garbage collection.

I wrote a small C program as a starting point to test this. The allocate() function performs millions of memory allocations, uses that memory for some simple calculations, and then frees all of it. The main function calls allocate(), then waits 10 minutes, then calls allocate() twice in rapid succession, then waits another 10 minutes. While this ran, I logged resident memory usage of the process about 5 times per second. I then ported the program to other libraries and languages for comparison. The main difference is that the garbage-collected languages supposedly don't require explicit frees of allocated memory. The garbage collector is supposed to figure out when allocated memory is no longer needed and free it for you, reducing the burden on the programmer. I decided to set variables to null and clean up anyway to help the garbage collectors as much as possible. There are many different garbage collection algorithms and associated tunable parameters. There is more detailed discussion in the results, in the conclusions, and in each language section.

My test system runs Fedora 20 Linux with the latest updates as of January 25, 2015. I had plenty of RAM to avoid swapping. The Makefile at the end shows compile and test execution.

Results

This plot shows C and C++ using pretty standard allocation and free/delete. Notice that glibc did not return all freed memory back to the OS during this test. This behavior is discussed more in the source code section. It retains a large portion of the memory allocated during allocate() for faster reuse. Notice that peak resident memory usage during the double-allocate() event at 610 seconds does not rise much above the first single-allocate() event at 10 seconds. Freed memory is reused immediately and efficiently. Also notice that resident memory usage during the second wait period is pretty much the same as it was during the first wait period. The unused heap memory retained by the program is not unnecessarily large, it is somewhere between minimum and peak. This plot represents what an ideal performance-focused garbage collector should be capable of.
This plot shows the same C and C++ code, but with malloc_trim(0) called immediately after the last free/delete. Notice that peak memory usage is the same for both events and the resident memory usage is minimal during both wait periods. The behavior is discussed more in the source code section. Resident memory usage is minimal at all times, but allocations result in more system calls, which likely had a performance penalty. This plot represents what an ideal resource-focused garbage collector should be capable of.
This plot shows the same Java code compiled with gcj and run natively, and run with OpenJDK java/JVM. Gcj had lower resident memory usage for the entire test. The peak gcj resident memory usage is close to the peak C/C++ memory usage, which indicates good garbage collection and reuse of previously allocated memory. Java was run with no options, so it used the default garbage collector and parameters. Notice the significant jump in resident memory usage at the second allocation event, about double the C/C++ peak. There appears to be no garbage collection and reuse during the rapid double allocation event. Neither approach returned memory to the OS during this test.
This plot shows the same Java code running in java/JVM with various garbage collection options. Notice the UseParallelGC/UseParallelOldGC option performed similar to having no options, so it may be the default. The other three options had lower peak memory usage, but still well above what should be required, which indicates partial collection and reuse of allocated memory. The UseConcmarkSweepGC did not return any memory to the OS. Both UseSerialGC and UseParNewGC returned some memory to the OS and had lower resident memory usage during the wait periods.
This plot shows the same Java code running in java/JVM with various garbage collection options. These three garbage collection options all returned significant amounts of memory back to the OS. The UseParNewGC with specified HeapFreeRatio bounds had the lowest java/JVM peak resident memory usage during the double allocation event and the lowest usage during the second wait period, but somewhat average usage during the first wait period. The UseG1GC had high double allocation event usage, but very low usage during the first and second wait periods.
This plot shows Go code compiled natively and JavaScript code running in Node.js. Notice that Go's garbage collector returns unused memory to the OS at timed intervals and it achieves only partial reuse during the double allocation event. Node had higher resident memory usage during the first event, but it achieved excellent reuse during the double allocation vent. Strangely, it seemed to return a bunch of memory to the OS right before or during the double allocation event.
This plot shows the same Go and JavaScript code, but with a couple lines added. The Go program calls debug.FreeOsMemory() after the first and second events. The JavaScript program calls global.gc() after the first and second events. Peak usage for both remains decent, but wait period usage is dramatically reduced. These setups collected all of the garbage and returned it back to the OS. While peak usage is higher here, these graphs are closest to C/C++ with malloc_trim(0). However, I had to suggest garbage collection and/or freeing memory to achieve this result.
This plot shows PHP, Python, and Ruby code. Ruby is the only one of these three with what I would consider decent memory usage. PHP and Python returned a bunch of memory to the OS during the first wait period, but memory usage is still ridiculous. They did not return much memory to the OS during the second wait period after the double allocation event. Peak usage for both events is about the same, which indicates good reuse, but the high levels indicate memory-hungry data structure implementations. Ruby returned a little memory to the OS, but usage stayed closed to its relatively low peak for most of the time.
This plot shows the same PHP, Python, and Ruby code, but with a couple explicit garbage collection calls added. PHP usage looks about the same as before. Peak usage for all three looks about the same as before, but wait period usage for Ruby is slightly lower and wait period usage for Python is dramatically reduced.

Conclusions

This was really interesting, definitely something I've wanted to learn more about for quite a while. Manual memory management with C or C++ was the most efficient in terms of performance and resource usage, as expected. Gcj seems to outperform java/JVM with default options, but neither returned memory to the OS during this test. The rapid double-allocation event proved especially difficult for the garbage collectors, probably because of collection cycles and triggers. Some automatic garbage collectors achieved good reuse and some were good about returning unused memory to the OS. Others, not so much. If you want to limit memory usage and achieve good resource sharing among various processes on the same system, choose your language and garbage collector carefully. Automatic garbage collection is the product of many trade offs and certainly not perfect, so tailoring performance/behavior may involve parameter/option tuning and/or code changes. Automatic means the garbage collector will automatically attempt to clean up after you. It does not mean the garbage collector will automatically behave exactly as you'd prefer. You might have to do some work to get it dialed in the way you want. The performance of various garbage collectors is necessarily code, language, and workload specific. The results presented here reflect my interests and chosen test approach, but it should not be difficult to port some of the code presented here to other languages for comparison.

testgc-malloc.c

When I first wrote and tested this program, idle memory usage was nearly as high as peak, which really threw me for a loop. I have written some tests like this before and free() seemed to be doing its job every time. Not this time however. After reading an
interesting forum post and a great article, it is clear that malloc() and free() are glibc or libc library functions that provide an additional layer between C and actual kernel system calls. The additional layer reduces the number of system calls by grouping many small requests into fewer large requests. It also has some built-in intelligence to determine when to actually perform the system calls. For example, a loop that repeatedly allocates 8 bytes, does something with it, and frees 8 bytes could be more efficient with just a single allocate and free. These two approaches together improve performance by reducing system calls. An unfortunate side effect of this strategy is that resident memory usage may be greater than required at any specific time because of the extra/standby/preemptive/free heap space occupied by the program. The behavior of malloc() and free() can be tuned using mallopt(). There are situations when the automatic behavior or the tuning parameters may not be sufficient. In that case, the malloc_trim() function returns free memory in the program heap to the OS. Normally, free() decides when to call malloc_trim().

For this simple test program, the idle resident memory usage during the wait periods was hundreds of megabytes, probably in anticipation of many more allocation requests. The program wasn't returning all of the freed memory back to the OS; it was saving much of it for faster future library allocations without system calls. As a result, resident memory usage rose to around 300-400MB and stayed relatively constant throughout execution. Adding an explicit call to malloc_trim(0) after the last free() reduced resident memory usage from hundreds of MB to just a couple KB during the wait periods, which brought the resident memory usage down close to the actual program memory usage. I definitely had to dig a bit deeper into this stuff than I planned, but I'm certainly glad I did.
#include <stdio.h>
#include <malloc.h>
#include <unistd.h>

typedef struct {
    int data1;
    int data2;
} my_data_t;

void allocate(void)
{
    my_data_t *my_data = NULL;
    my_data_t **array = NULL;
    int element = 0;
    int list_size = 10000000;
    double sum = 0.0;
    for (element = 0; element < list_size; element++) {
        my_data = (my_data_t*)malloc(sizeof(my_data_t));
        my_data->data1 = element;
        my_data->data2 = element;
        array = (my_data_t**)realloc(array, sizeof(my_data_t*)*(element+1));
        array[element] = my_data;
    }
    for (element = 0; element < list_size; element++) {
        my_data = array[element];
        sum += my_data->data1;
        sum += my_data->data2;
        free(my_data);
    }
    free(array);
#ifdef USE_TRIM
    malloc_trim(0);
#endif
    printf("sum %E\n", sum);
}

void waitsec(int sec)
{
    int i;
    for (i = 0; i < sec; i++)
        sleep(1);
}

int main(int argc, char **argv)
{
    waitsec(10);
    allocate();
    waitsec(600);
    allocate();
    allocate();
    waitsec(600);
    return 0;
}

testgc-glib.c

#include <stdio.h>
#include <glib.h>
#include <unistd.h>
#include <malloc.h>

typedef struct {
    int data1;
    int data2;
} my_data_t;

void allocate(void)
{
    my_data_t *my_data = NULL;
    GArray *array = NULL;
    int element = 0;
    int list_size = 10000000;
    double sum = 0.0;
    array = g_array_new(FALSE, FALSE, sizeof(my_data_t*));
    for (element = 0; element < list_size; element++) {
        my_data = g_new(my_data_t, 1);
        my_data->data1 = element;
        my_data->data2 = element;
        g_array_append_val(array, my_data);
    }
    for (element = 0; element < list_size; element++) {
        my_data = g_array_index(array, my_data_t*, element);
        sum += my_data->data1;
        sum += my_data->data2;
        g_free(my_data);
    }
    g_array_free(array, TRUE);
#ifdef USE_TRIM
    malloc_trim(0);
#endif
    printf("sum %E\n", sum);
}

void waitsec(int sec)
{
    int i;
    for (i = 0; i < sec; i++)
        sleep(1);
}

int main(int argc, char **argv)
{
    waitsec(10);
    allocate();
    waitsec(600);
    allocate();
    allocate();
    waitsec(600);
    return 0;
}

testgc.cpp

#include <iostream>
#include <vector>
#include <chrono>
#include <thread>
#include <malloc.h>
using namespace std;

class my_data_t {
    public:
        int data1;
        int data2;
};

void allocate(void)
{
    my_data_t *my_data = NULL;
    vector<my_data_t*> *Vector = NULL;
    int element = 0;
    int list_size = 10000000;
    double sum = 0.0;
    Vector = new vector<my_data_t*>();
    for (element = 0; element < list_size; element++) {
        my_data = new my_data_t();
        my_data->data1 = element;
        my_data->data2 = element;
        Vector->push_back(my_data);
    }
    for (element = 0; element < list_size; element++) {
        my_data = Vector->at(element);
        sum += my_data->data1;
        sum += my_data->data2;
        delete my_data;
    }
    delete Vector;
#ifdef USE_TRIM
    malloc_trim(0);
#endif
    cout << "sum " << sum << endl;
}

void waitsec(int sec)
{
    for (int i = 0; i < sec; i++) {
        chrono::seconds timespan(1);
        this_thread::sleep_for(timespan);
    }
}

int main(int argc, char **argv)
{
    waitsec(10);
    allocate();
    waitsec(600);
    allocate();
    allocate();
    waitsec(600);
    return 0;
}

testgc.java

I tried to give the Java garbage collectors as many hints as possible. I explicitly set object references to null, cleared the vector, made sure the variables went out of scope, and even suggested when it was a good time to do garbage collection. I tested native-compiled gcj and OpenJDK/java/1.7.0_71 with a variety of
garbage collector options. I found two interesting articles on java garbage collection. I also found a good discussion on the relatively new G1/Garbage-First garbage collector.
import java.util.*;
import java.util.concurrent.TimeUnit;

class my_data_t {
    int data1;
    int data2;
};

public class testgc {
    public static void allocate() {
        my_data_t my_data = null;
        Vector<my_data_t> vector = new Vector<my_data_t>();
        int element = 0;
        int list_size = 10000000;
        double sum = 0.0;
        for (element = 0; element < list_size; element++) {
            my_data = new my_data_t();
            my_data.data1 = element;
            my_data.data2 = element;
            vector.add(my_data);
        }
        for (element = 0; element < list_size; element++) {
            my_data = vector.get(element);
            sum += my_data.data1;
            sum += my_data.data2;
            vector.set(element, null);
        }
        my_data = null;
        vector.clear();
        vector = null;
        System.out.println("sum " + sum);
    }
    public static void waitsec(int sec) {
        for (int i = 0; i < sec; i++) {
            try {
                TimeUnit.SECONDS.sleep(1);
            } catch (InterruptedException e) {
            }
        }
    }
    public static void main(String args[]) {
        waitsec(10);
        allocate();
        System.gc(); // suggest garbage collection
        waitsec(600);
        allocate();
        allocate();
        System.gc(); // suggest garbage collection
        waitsec(600);
    }
}

testgc.go

My approach with Go was similar to Java. I set references to nil and made sure they went out of scope. I found an interesting
forum post that showed how to trigger the garbage collector, so I added that in just like the Java code.
package main
import "fmt"
import "time"
import "runtime"

type my_data_t struct {
    data1 int
    data2 int
}

func allocate() {
    var (
        element int = 0
        list_size int = 10000000
        sum float64 = 0.0
        my_data *my_data_t = nil
        array []*my_data_t = nil
        )
    array = []*my_data_t{}
    for element = 0; element < list_size; element++ {
        my_data = new(my_data_t)
        my_data.data1 = element
        my_data.data2 = element
        array = append(array, my_data)
    }
    for element = 0; element < list_size; element++ {
        my_data = array[element]
        sum += float64(my_data.data1)
        sum += float64(my_data.data2)
        array[element] = nil
    }
    fmt.Printf("sum %E\n", sum)
    my_data = nil
    array = nil
}

func waitsec(sec int) {
    var (
        i int
    )
    for i = 0; i < sec; i++ {
        time.Sleep(1 * time.Second)
    }
}

func main() {
    waitsec(10)
    allocate()
    runtime.GC() // suggest garbage collection
    waitsec(600)
    allocate()
    allocate()
    runtime.GC() // suggest garbage collection
    waitsec(600)
}

testgc-free.go

The garbage collector seemed to work pretty well, but with a significant time delay before releasing memory back to the OS. That same
forum post also suggested debug.FreeOSMemory() to force the garbage collector to release free memory back to the OS. I imagine it's just doing something like malloc_trim(0) under the hood.
package main
import "fmt"
import "time"
import "runtime"
import "runtime/debug"

type my_data_t struct {
    data1 int
    data2 int
}

func allocate() {
    var (
        element int = 0
        list_size int = 10000000
        sum float64 = 0.0
        my_data *my_data_t = nil
        array []*my_data_t = nil
        )
    array = []*my_data_t{}
    for element = 0; element < list_size; element++ {
        my_data = new(my_data_t)
        my_data.data1 = element
        my_data.data2 = element
        array = append(array, my_data)
    }
    for element = 0; element < list_size; element++ {
        my_data = array[element]
        sum += float64(my_data.data1)
        sum += float64(my_data.data2)
        array[element] = nil
    }
    fmt.Printf("sum %E\n", sum)
    my_data = nil
    array = nil
}

func waitsec(sec int) {
    var (
        i int
    )
    for i = 0; i < sec; i++ {
        time.Sleep(1 * time.Second)
    }
}

func main() {
    waitsec(10)
    allocate()
    runtime.GC() // suggest garbage collection
    debug.FreeOSMemory() // reduce heap
    waitsec(600)
    allocate()
    allocate()
    runtime.GC() // suggest garbage collection
    debug.FreeOSMemory() // reduce heap
    waitsec(600)
}

testgc.js

Here is a good article about
Node.js garbage collection.
#!/usr/bin/node

function allocate() {
    var element = 0;
    var list_size = 10000000;
    var sum = 0.0;
    var array = new Array();
    for (element = 0; element < list_size; element++) {
        my_data = {data1:element, data2:element}
        array.push(my_data);
    }
    for (element = 0; element < list_size; element++) {
        my_data = array[element];
        sum += my_data.data1;
        sum += my_data.data2;
    }
    console.log("sum " + sum);
    array = null;
}

function waitsec(sec) {
    var i;
    var sleep = require('sleep');
    for (i = 0; i < sec; i++)
        sleep.sleep(1);
}

waitsec(10);
allocate();
waitsec(600);
allocate();
allocate();
waitsec(600);

testgc-gc.js

I found a great
blog post about forcing garbage collection in Node.js and V8. This call seems to force both garbage collection and releasing free/unused heap to the OS.
#!/usr/bin/node --expose-gc

function allocate() {
    var element = 0;
    var list_size = 10000000;
    var sum = 0.0;
    var array = new Array();
    for (element = 0; element < list_size; element++) {
        my_data = {data1:element, data2:element}
        array.push(my_data);
    }
    for (element = 0; element < list_size; element++) {
        my_data = array[element];
        sum += my_data.data1;
        sum += my_data.data2;
    }
    console.log("sum " + sum);
    array = null;
}

function waitsec(sec) {
    var i;
    var sleep = require('sleep');
    for (i = 0; i < sec; i++)
        sleep.sleep(1);
}

waitsec(10);
allocate();
global.gc(); // suggest garbage collection
waitsec(600);
allocate();
allocate();
global.gc(); // suggest garbage collection
waitsec(600);

testgc.php

#!/usr/bin/php
<?php
ini_set('memory_limit', '-1');

class my_data_t {
    var $data1;
    var $data2;
}

function allocate() {
    $element = 0;
    $list_size = 10000000;
    $sum = 0.0;
    $array;
    for ($element = 0; $element < $list_size; $element++) {
        $my_data = new my_data_t();
        $my_data->data1 = $element;
        $my_data->data2 = $element;
        $array[] = $my_data;
    }
    for ($element = 0; $element < $list_size; $element++) {
        $my_data = $array[$element];
        $sum += $my_data->data1;
        $sum += $my_data->data2;
        $array[$element] = 0;
    }
    fwrite(STDOUT, "sum ". $sum . "\n");
    $array = 0;
}

function waitsec($sec) {
    for ($i = 0; $i < $sec; $i++) {
        sleep(1);
    }
}

waitsec(10);
allocate();
waitsec(600);
allocate();
allocate();
waitsec(600);

?>

testgc-gc.php

#!/usr/bin/php
<?php
ini_set('memory_limit', '-1');

class my_data_t {
    var $data1;
    var $data2;
}

function allocate() {
    $element = 0;
    $list_size = 10000000;
    $sum = 0.0;
    $array;
    for ($element = 0; $element < $list_size; $element++) {
        $my_data = new my_data_t();
        $my_data->data1 = $element;
        $my_data->data2 = $element;
        $array[] = $my_data;
    }
    for ($element = 0; $element < $list_size; $element++) {
        $my_data = $array[$element];
        $sum += $my_data->data1;
        $sum += $my_data->data2;
        $array[$element] = 0;
    }
    fwrite(STDOUT, "sum ". $sum . "\n");
    $array = 0;
}

function waitsec($sec) {
    for ($i = 0; $i < $sec; $i++) {
        sleep(1);
    }
}

waitsec(10);
allocate();
gc_collect_cycles();
waitsec(600);
allocate();
allocate();
gc_collect_cycles();
waitsec(600);

?>

testgc.py

#!/usr/bin/python
import time
import gc

class my_data_t:
    data1 = 0
    data2 = 0

def allocate():
    element = 0
    list_size = 10000000
    total = float(0.0)
    array = []
    for element in range(0, list_size):
        my_data = my_data_t()
        my_data.data1 = element
        my_data.data2 = element
        array.append(my_data)
    for element in range(0, list_size):
        my_data = array[element]
        total += my_data.data1
        total += my_data.data2
        array[element] = 0
    print 'sum', total
    array = 0

def waitsec(sec):
    for i in range(1, sec):
        time.sleep(1)

waitsec(10)
allocate()
waitsec(600)
allocate()
allocate()
waitsec(600)

testgc-gc.py

#!/usr/bin/python
import time
import gc

class my_data_t:
    data1 = 0
    data2 = 0

def allocate():
    element = 0
    list_size = 10000000
    total = float(0.0)
    array = []
    for element in range(0, list_size):
        my_data = my_data_t()
        my_data.data1 = element
        my_data.data2 = element
        array.append(my_data)
    for element in range(0, list_size):
        my_data = array[element]
        total += my_data.data1
        total += my_data.data2
        array[element] = 0
    print 'sum', total
    array = 0

def waitsec(sec):
    for i in range(1, sec):
        time.sleep(1)

waitsec(10)
allocate()
gc.collect()
waitsec(600)
allocate()
allocate()
gc.collect()
waitsec(600)

testgc.rb

#!/usr/bin/ruby

def allocate()
    my_data_t = Struct.new(:data1, :data2)
    element = 0
    list_size = 10000000
    sum = 0.0
    array = Array.new()
    for element in 0..list_size-1
        my_data = my_data_t.new(0, 0)
        my_data.data1 = element
        my_data.data2 = element
        array.push(my_data)
    end
    for element in 0..list_size-1
        my_data = array[element]
        sum += my_data.data1
        sum += my_data.data2
        array[element] = 0
    end
    puts "sum #{sum}"
    array = nil
end

def waitsec(sec)
    for i in 1..sec
        sleep 1
    end
end

waitsec 10
allocate
waitsec 600
allocate
allocate
waitsec 600

testgc-gc.rb

#!/usr/bin/ruby

def allocate()
    my_data_t = Struct.new(:data1, :data2)
    element = 0
    list_size = 10000000
    sum = 0.0
    array = Array.new()
    for element in 0..list_size-1
        my_data = my_data_t.new(0, 0)
        my_data.data1 = element
        my_data.data2 = element
        array.push(my_data)
    end
    for element in 0..list_size-1
        my_data = array[element]
        sum += my_data.data1
        sum += my_data.data2
        array[element] = 0
    end
    puts "sum #{sum}"
    array = nil
end

def waitsec(sec)
    for i in 1..sec
        sleep 1
    end
end

waitsec 10
allocate
GC.start
waitsec 600
allocate
allocate
GC.start
waitsec 600

capture.sh

#!/bin/bash

date=`date +%s`
rc=0
while [ $rc -eq 0 ]; do
    ps -C $1 -F --no-header >> mem-usage-$1-$date.csv
    rc=$?
    #sleep 1
    usleep 200000
done

Makefile

all: \
testgc-malloc \
testgc-malloc-trim \
testgc-glib \
testgc-glib-trim \
testgc-cpp \
testgc-cpp-trim \
testgc-gcj testgc.class \
testgc-go \
testgc-go-free

testgc-malloc: testgc-malloc.c
	gcc -O3 -o testgc-malloc testgc-malloc.c
testgc-malloc-trim: testgc-malloc.c
	gcc -O3 -DUSE_TRIM -o testgc-malloc-trim testgc-malloc.c

testgc-glib: testgc-glib.c
	gcc -O3 -I/usr/include/glib-2.0 \
	-I/usr/lib64/glib-2.0/include \
	-lglib-2.0 -o testgc-glib testgc-glib.c
testgc-glib-trim: testgc-glib.c
	gcc -O3 -I/usr/include/glib-2.0 \
	-I/usr/lib64/glib-2.0/include \
	-lglib-2.0 -DUSE_TRIM -o testgc-glib-trim testgc-glib.c

testgc-cpp: testgc.cpp
	g++ -O3 -std=c++11 -o testgc-cpp testgc.cpp
testgc-cpp-trim: testgc.cpp
	g++ -O3 -std=c++11 -DUSE_TRIM -o testgc-cpp-trim testgc.cpp

testgc-gcj: testgc.java
	gcj -O3 -DLARGE_CONFIG --main=testgc \
	-o testgc-gcj testgc.java

testgc.class: testgc.java
	javac testgc.java

testgc-go: testgc.go
	go build -o testgc-go testgc.go
testgc-go-free: testgc.go
	go build -o testgc-go-free testgc-free.go

clean:
	rm -f testgc-malloc testgc-glib testgc-cpp
	rm -f testgc-malloc-trim testgc-glib-trim testgc-cpp-trim
	rm -f testgc-gcj *.class
	rm -f testgc-go testgc-go-free

run_test: all
	#./capture.sh testgc-malloc &
	#./testgc-malloc ; sleep 2
	#./capture.sh testgc-malloc-trim &
	#./testgc-malloc-trim ; sleep 2
	#./capture.sh testgc-glib &
	#./testgc-glib ; sleep 2
	#./capture.sh testgc-glib-trim &
	#./testgc-glib-trim ; sleep 2
	#./capture.sh testgc-cpp &
	#./testgc-cpp ; sleep 2
	#./capture.sh testgc-cpp-trim &
	#./testgc-cpp-trim ; sleep 2
	#./capture.sh testgc-gcj &
	#./testgc-gcj ; sleep 2
	#./capture.sh java &
	#java testgc ; sleep 2
	#./capture.sh java &
	#java -XX:+UseParNewGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 testgc ; sleep 2
	#./capture.sh java &
	#java -XX:+UseParNewGC testgc ; sleep 2
	#./capture.sh java &
	#java -XX:+UseSerialGC testgc ; sleep 2
	#./capture.sh java &
	#java -XX:+UseG1GC testgc ; sleep 2
	#./capture.sh java &
	#java -XX:+UseParallelGC -XX:+UseParallelOldGC testgc ; sleep 2
	#./capture.sh java &
	#java -XX:+UseConcMarkSweepGC testgc ; sleep 2
	#./capture.sh java &
	#java -XX:+UseG1GC -XX:MinHeapFreeRatio=1 -XX:MaxHeapFreeRatio=2 testgc ; sleep 2
	#./capture.sh testgc-go &
	#./testgc-go ; sleep 2
	#./capture.sh testgc-go-free &
	#./testgc-go-free ; sleep 2
	#./capture.sh testgc.js &
	#./testgc.js ; sleep 2
	#./capture.sh testgc-gc.js &
	#./testgc-gc.js ; sleep 2
	#./capture.sh testgc.py &
	#./testgc.py ; sleep 2
	#./capture.sh testgc-gc.py &
	#./testgc-gc.py ; sleep 2
	#./capture.sh testgc.php &
	#./testgc.php ; sleep 2
	#./capture.sh testgc-gc.php &
	#./testgc-gc.php ; sleep 2
	./capture.sh ruby-mri &
	./testgc.rb ; sleep 2
	./capture.sh ruby-mri &
	./testgc-gc.rb ; sleep 2