Major TODO Items For Next Release

The DaCapo benchmark suite is a community project, developed by the research community, for the research community. The quality of the workloads depends on community critque and community contributions. There is a large amount of work to be done. Please feel free to help by contributing to one or more of the following tasks (use the mailing list or email Steve Blackburn directly to co-ordinate your efforts).

Revise This List For Next Release

DayTrader Benchmarks

Stability

The workload was originally developed for DaCapo under MacOS X using the default JVM on that platform, Sun's 1.5.0 HotSpot VM. Once completed we started testing on a other platforms, with other production JVMs and made two significant observations:

The workloads are stable on most VMs using H2 database, but were noted as unstable with Derby.
Performance. Under Mac OS X (using Derby), CPU utilization is near 100% (top shows "180%" or so on a dual-core machine), however under ubuntu, running similar HotSpot JVM on similar hardware, CPU utilization drops precipitiously. However, shifting to H2 the performance seems stable so this is no longer a priority.

Feedback On Existing Benchmarks

Feedback on the existing benchmarks will help us determine which benchmarks to drop from the upcoming release (the release will include new benchmarks and drop some of the existing benchmarks). If you have any comments, please let us know.

Current candidates for exclusion from the upcoming suite include bloat (has some notable idiosyncrasies, and is not extensively deployed), hsqldb (superseded by derby), and possibly antlr and/or chart.

Incorporate New Benchmarks

The new suite includes a number of new benchmarks.

Avrora

Done Jun 2009

Batik

Done Dec 2007

Sunflow

Done Jan 2007

H2

Done. H2 driven with the Derby TPC-C like workloads.

DayTrader

Done Dec 2009

Wiki

Sergey Salishev of Intel has created a workload that simulates a cloud computing environment, including a wiki. It would be great to evaluate this workload and incorporate it into the next release.

Jetty/Coccoon

Improve Harness

There are a number of ways we could improve the DaCapo benchmark harness. These include:

Thread Creation Callbacks

Andrew Tick of HP suggested on the mailing list in Sept 07 that we include a callback on thread creation, along the lines of the callback we currently have at each iteration start and end.

Inclusion of the Fragger Tool

Cliff Click pointed me to the fragger tool which injects fragmentation into the Java heap. It would be nice to include fragger as a commandline option on DaCapo.

Update Documentation

Each of the new benchmarks needs to be documented and the documentation on building etc needs to be updated to reflect changes over the past two years.

Update Existing Benchmarks

Version Updates

We have been incrementally updating version numbers in the svn head. Each benchmark needs to be rechecked to ensure it is at the most recent stable release for that workload.

Xalan

We have used a very particular version of Xalan (2.4.1), on advice from Kev Jones of Intel (who provided us with our current version of the workload). Kev's rationale may have become dated with newer releases of Xalan. We should revisit this decision for the next release.

Eclipse

In addition to updating the Eclipse version, it would be great to investigate, and if necessary address a problem identified by Matt Arnold in June 2007:

I have some info about the Dacapo Eclipse benchmark that you may be interested in. If I should contact someone else instead, feel free to let me know.
If you run eclipse for a large number of iterations, the performance forms a saw tooth, degrading significantly (a factor of 10 or more), then jumps back to normal. An excel graph of the performance over time is attached below. (Dacapo version 2006-10.jar). It happens on both Sun's and IBM VM's.
During the slow iterations the program is spending most of its time in jitted code. The problem is the following two methods:
org/eclipse/jdt/internal/compiler/util/WeakHashSetOfCharArray.add([C)[C
org/eclipse/jdt/internal/compiler/util/WeakHashSet.add (Ljava/lang/Object;)Ljava/lang/Object;

Both of these methods have a linear search through some kind of linked list of weak references. Here's my guess at what is happening: This list grows over time and the linear searches eventually becomes a huge bottleneck. Eventually some memory threshold is crossed and the VM clears the weak references, and performance goes back to normal.
This is clearly crappy code (linear searching) but it could also be a benchmark bug. Is it possible that this data structure should have been re-initialized between iterations, and the iterating nature of the driver is creating a problem is unlikely to exist in the real application?

For questions or comments please use the researchers mailing list.

Last modified: Sun Jun 21 13:46:36 EST 2009