dacapo-9.12-bach  RELEASE NOTES  2009-12-23

This is the second major release of the DaCapo benchmark suite.  

These notes are structured as follows:

    1. Overview
    2. Usage
    3. Changes
    4. Known problems and limitations
    5. Contributions and Acknowledgements


1. Overview
-----------

The DaCapo benchmark suite is slated to be updated every few years.
The 9.12 release is the first major update of the suite, and is
strictly incompatible with previous releases: new benchmarks have been
added, old benchmarks have been removed, all other benchmarks have
been substantially updated and the inputs have changed for every
program. It is for this reason that in any published use of the suite,
the version of the suite must be explicitly stated.

The release sees the retirement of a number of single-threaded
benchmarks (antlr, bloat and chart), the replacement of hsqldb by
h2, the addition of six completely new benchmarks, and the upgrade
of all other benchmarks to reflect the current release state of the
applications from which the benchmarks were derived. These changes are
consistent with the original goals of the DaCapo project, which
include the desire for the suite to remain relevant and reflect the
current state of deployed Java applications.

Each of these benchmarks is tested for both performance and
correctness nightly. Results are available here:

    o performance:  http://dacapo.anu.edu.au/regression/perf/head.html
    o sanity:  http://dacapo.anu.edu.au/regression/sanity/latest/


2. Usage
--------

2.1 Downloading

  o Download the binary jar and/or source zip from:
      https://sourceforge.net/projects/dacapobench/files/

  o Access the source from subversion via
      svn co https://dacapobench.svn.sourceforge.net/svnroot/dacapobench/benchmarks/trunk dacapobench

2.2 Running

  o It is essential that you read and observe the usage guidelines
    that appear in README.txt

  o Run a benchmark:
      java -jar <dacapo-jar-name>.jar <benchmark>

  o For usage information, run with no arguments.
  

2.3 Building

  o You must have a working, recent version of ant installed. Change
    to the benchmarks directory and then run:

       ant -p

    for instructions on how to build.


3. Changes
----------

3.1. Benchmark additions since 2006-10-MR2

      avrora: AVRORA is a set of simulation and analysis tools in a
              framework for AVR micro-controllers. The benchmark
              exhibits a great deal of fine-grained concurrency. The
              benchmark is courtesy of Ben Titzer (Sun Microsystems)
              and was developed at UCLA.

       batik: Batik is an SVG toolkit produced by the Apache foundation.
              The benchmark renders a number of svg files.

          h2: h2 is an in-memory database benchmark, using the
              h2 database produced by h2database.com, and executing an
              implementation of the TPC-C workload produced by the
              Apache foundation for its derby project. h2 replaces
              derby, which in turn replaced hsqldb.

     sunflow: Sunflow is a raytracing rendering system for photo-realistic
              images.

      tomcat: Tomcat uses the Apache Tomcat servelet container to run
              some sample web applications.

  tradebeans: Tradebeans runs the Apache daytrader workload "directly"
              (via EJB) within a Geronimo application server.  Daytrader
              is derived from the IBM Trade6 benchmark.

   tradesoap: Tradesoap is identical to the tradebeans workload, except
              that client/server communications is via soap protocols
              (and the workloads are reduced in size to compensate the
              substantially higher overhead).

              Note that tradebeans and tradesoap were intentionally
              added as a pair to allow researchers to evaluate and
              analyze the overheads and behavior of communicating
              through a protocol such as SOAP.  Tradesoap's "large"
              configuration uses exactly the same workload as
              tradebeans' "default" configuration, and tradesoap's
              "huge" uses exactly the same workload as tradebeans'
              "large", allowing researchers to directly compare the
              two systems.


3.2. Benchmark deletions

       antlr: Antlr is single threaded and highly repetitive. The
              most recent version of jython uses antlr; so antlr
              remains represented within the DaCapo suite.

       bloat: Bloat is not as widely used as our other workloads
              and the code exhibited some pathologies that were
              arguably not representative or desirable in a suite that
              was to be representative of modern Java applications.

       chart: Chart was repetitive and used a framework that appears
              not to be as widely used as most of the other DaCapo
              benchmarks.  The Batik workload has some similarities
              with chart (both are render vector graphics), but is
              part of a larger heavily used framework from Apache.

       derby: Derby has been replaced by h2, which runs a much
              richer workload and uses a more widely used and higher
              performing database engine (derby was not in any
              previous release, but had been slated for inclusion in
              this release).

      hsqldb: Hsqldb has been replaced by h2, which runs a much
              richer workload and uses a more widely used and higher
              performing database engine.


3.3. Benchmark updates

All other benchmarks have been updated to reflect the latest release
of the underlying application.


3.4. Other Notable Changes

The packaging of the DaCapo suite has been completely re-worked and
the source code is entirely re-organized.

We've changed the naming scheme for the releases.  Rather than
"dacapo-YYYY-MM", we've moved to "dacapo-Y.M-TAG", where TAG is a
nickname for the release.  Given the theme for this project, we're
using musical names, and since this release is our second, we've given
this one the nick-name "bach".  The release can therefore be referred
to by its nickname, which rolls off the tounge a little more easily
than our old names.  Of course we've borrowed this scheme from other
projects (such as Ubuntu) which follow a similar pattern.

The command-line arguments have be rationalized and now follow
posix conventions.

Threading has been rationalized.   Benchmarks are now characterized
in terms of their external and internal concurrency.  (For example
a benchmark such as eclipse is single-threaded externally, but
internally uses a thread pool).   All benchmarks which are externally
multi-threaded now by default run a number of threads scaled to
match the available processors, and the number of externally defined
threads may also be configured via the "-t" and "-k" command line
options which specify, respectively the absolute number of external
threads and a multiplier against the number of available processors.
Some benchmarks are both internally and externally multithreaded,
such as tradebeans and tradesoap, where the number of client threads
may be specified externally, but the number of server threads is
determined within the server, and cannot be directly controlled by
the user.

We have introduced a "huge" size for a number of benchmarks, which
scales the workload to run for much longer and consume significant
memory.  We have also retired "large" sizes for some benchmarks where
"large" was not distinctly different from "default".  Thus there are
now four sizes: "small", "default", "large", and "huge", and "large"
and "huge" are only available for some benchmarks.  If you attempt
to run a benchmark at an unsupported size you will get an error
message.


4. Known Issues
---------------

Please consult the bug tracker for a complete and up-to-date list of
known issues (http://sourceforge.net/tracker/?group_id=172498&atid=861957).

DaCapo is an open source community project. We welcome all assistance
in addressing bugs and shortcomings in the suite.

A few notable unresolved high priority issues are listed here:


4.1 Socket use by tradebeans, tradesoap and tomcat

Each of these benchmarks use sockets to communicate between their
clients and server.  We have observed that connections are used very
liberally (we have seen more than 64,000 connections in use when
running tradebeans in its "huge" configuration, according to netstat).
We believe that this phenomena can lead to spurious failures,
particularly on tradesoap, where the benchmark fails with an error
message that indicates a garbled bean (stock name seen when userid
expected).  At the time of writing, we believe these issues are
platform-sensitive and are due to the underlying systems rather than
our particular use of them.  As with all issues, we welcome feedback
and fixes from the community.


4.2 Tomcat

Tomcat remains less interesting than we would have liked. Performance
results show that tomcat currently has a remarkably flat warm-up curve
when compared to other benchmarks.


4.3 Validation

Validation continues to use summarization via a checksum, so we are
unable to provide a diff between expected and actual output in the
case of failure.   We hope to update this, and welcome community
contributions.


4.4 Support for whole-program static analysis

Despite significant help from the community, we have had to drop
support for whole-program static analysis that was available in the
last major release.  The main reason for this is that the more
systematic and extensive use of reflection and the enormous internal
complexity of workloads such as tradebeans and tradesoap has made it
very difficult to produce a straightforward mechanism that would
facilitate such analyses.  While we regret this omission, such an
addition should have no effect on the workloads themselves.
Therefore, if the community is able to contribute enhancements or
extensions to the suite that facilitate such static analysis, we
should be able to include such a contribution in a maintenance
release, rather than having to wait for the next major release of the
DaCapo benchmark suite.


5. Contributions and Acknowledgements
-------------------------------------

The generous financial support of Intel was crucial to the successful
completion of this release.

The production of the 9.12 release was jointly led by:

    Steve Blackburn, Australian National University
    Kathryn S McKinley, University of Texas at Austin

The 9.12 release of the DaCapo suite was developed primarily by:

     Steve Blackburn, Australian National University
     Daniel Frampton, Australian National University
     Robin Garner, Australian National University
     John Zigman, Australian National University

We receieved considerable assistance from a number of people,
including:

     Eric Bodden, Technische Universität Darmstadt
     Sam Guyer, Tufts
     Chris Kulla
     Nick Mitchell, IBM
     Gary Sevitsky, IBM
     Ben Titzer, Sun Microsystems

Many other people provided valuable feedback, bug fixes and advice.