Java Performance
By: Joe Sharp
The Java
language emphasizes accurate, reliable behavior at the expense of performance.
This is reflected in features such as automatic garbage collection, rigorous
runtime checking, complete byte code checking, and conservative runtime
synchronization. Availability on a wide choice of platforms leads, at present,
to an interpreted virtual machine that further handicaps performance.
About performance, Steve McConnell [16]
quoted: “Complete it first, and then perfect it. The part that needs to be
perfect is usually small.” This appendix will aid you in locating and
optimizing that “part that needs to be
perfect.”
Basic approach
You should address performance only
after you have a correct and fully tested program:
- Measure the program’s performance under
realistic conditions. If it meets your requirements, you are finished. If not,
go to the next step.
- Find the most
critical performance bottleneck. This might require considerable ingenuity, but
the effort will pay off. If you simply guess where the bottleneck is and try to
optimize there, you’ll waste your
time.
- Apply the speed improvement
techniques discussed in this appendix, then return to Step
1.
Finding the critical
bottleneck is the key to cost-effective effort – Donald Knuth [9] improved
a program where 50 percent of the time was spent in less than 4 percent of the
code. He changed a few lines in an hour of work and doubled the program speed.
Working on the rest of the program would have dissipated his valuable time and
effort. To quote Knuth, “Premature optimization is the root of all
evil.” It is wise to restrain your impulses to optimize early because you
may forgo many useful programming techniques, resulting in code that’s
harder to understand, riskier, and requires more effort to
maintain.
Locating the bottleneck
Three approaches to locating the
performance-critical part of a program
are:
1. Install your own instrumentation
“Profile” code by
inserting explicit timing:
long start = System.currentTimeMillis();
// Operation to be timed goes here
long time = System.currentTimeMillis() - start;
Have an infrequently-used method
print cumulative times out to the console window with
System.out.println( ). Since the compiler will ignore it when false,
a static final boolean switch can turn the timing on and off so the code
can efficiently be left in place in released code, ready for emergency use at
any time. Even when more sophisticated profiling is available, this is a
convenient way to time a specific task or operation.
System.currentTimeMillis( )
returns time in 1/1000ths of a second. However, some systems with time
resolution less than a millisecond (such as a Windows PC) need to repeat an
operation n times and divide the total time by n to get accurate
estimates.
2. JDK profiling [2]
The JDK comes with a built-in
profiler that keeps track of the time spent in each routine and writes the
information to a file. Unfortunately, the JDK profilers have uneven performance.
JDK 1.1.1 works, but subsequent releases have had various
instabilities.
To run the profiler, use the
-prof option when invoking the unoptimized versions of the Java
interpreter, for example:
java_g -prof myClass
java_g -prof sun.applet.AppletViewer applet.html
The profiler output is not
particularly easy to decipher. In fact, in JDK 1.0 it truncates the method names
to 30 characters, so it might not be possible to distinguish between some
methods. However, if your platform does support the -prof option, either
Vladimir Bulatov’s HyperProf [3] or Greg White’s
ProfileViewer [4] will help interpret the
results.
3. Special tools
The best way to keep up with the
exploding field of performance optimization tools is through a Web site such as
Jonathan Hardwick’s Tools for Optimizing Java [5] at
http://www.cs.cmu.edu/~jch/java/tools.html.
Tips for measuring performance
- Since profiling uses
clock time, make every effort to remove other processes during the
measurement.
- Always
time the code before and after making changes to verify that, at least on the
test platform, your changes improved the program. (Jon Bentley mentioned that
some of his most logical changes actually slowed the program
down.)
- Try to
make each timing test under identical conditions.
- If possible,
contrive a test that doesn’t rely on any user input to avoid variations in
user response that can cause the results to fluctuate.
Speedup techniques
Now that the critical region has
been isolated, you can apply two types of
optimizations: generic techniques and those
specific to Java.
Generic approaches
An effective generic speedup is to
redefine the program in a more practical way. For example, in Programming
Pearls [14], Bentley describes Doug McIlroy’s representation of the
English language with a novel data depiction that enabled him to produce a
remarkably fast, compact spelling checker. In addition, choosing a better
algorithm will probably give a bigger performance gain than any other approach,
particularly as the size of the data set increases. For more of these generic
approaches, see the general book listings [12-19] at the end of this appendix.
Language dependent approaches
To put things in perspective,
it’s useful to look at the time it takes to perform various operations. So
that the results are relatively independent of the computer being used, they
have been normalized by dividing by the time it takes to make a local
assignment.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
throw and catch
exception
|
try{ throw e; }
catch(e){}
|
|
|
|
|
|
|
|
|
|
|
Using present systems (such as
Pentium 200 pro, Netscape 3, and JDK 1.1.5), these relative times show the
extraordinary cost of new objects and arrays, the heavy cost of synchronization,
and the modest cost of an unsynchronized method call. References [5] and [6]
give the Web address of measurement applets you can run on your own
machine.
General modifications
Here are some modifications that
you can make to speed up time-critical parts of your Java program. (Be sure to
test the performance before and after you try them.)
|
|
|
|
Abstract Class (when only one
parent is needed)
|
Multiple inheritance of interfaces
prevents some optimizations.
|
Non-local or array loop
variable
|
|
Time (above) shows an instance
integer assignment is 1.2 local integer assignments, but an array assignment is
2.7 local integer assignments.
|
|
Saving discarded link items or
replacing the list with a circular array (in which approximate size is
known)
|
Each new object takes 980 local
assignments. See Reusing Objects (below), Van Wyk [12] p. 87 and Bentley[15] p.
81
|
|
X >> 2
(or any power of 2)
|
Uses faster hardware
instructions
|
Specific situations
The cost of Strings: The
String concatenation operator + looks innocent but involves a lot of
work. The compiler can efficiently concatenate constant strings, but a variable
string requires considerable processing. For example, if s and t
are String variables:
System.out.println("heading" + s + "trailer" + t);
this requires a new
StringBuffer, appending arguments, and converting the result back to a
String with toString( ). This costs both space and time. If
you’re appending more than one String, consider using a
StringBuffer directly, especially if you can repeatedly reuse it in a
loop. Preventing the creation of a new StringBuffer on each iteration
saves the object creation time of 980 seen earlier. Using
substring( ) and the other String methods is usually an
improvement. When feasible, character arrays are even faster. Also notice that
StringTokenizer is costly because of synchronization.
Synchronization: In the JDK
interpreter, calling a synchronized method is typically 10 times slower
than calling an unsynchronized method. With JIT compilers, this performance gap
has increased to 50 to 100 times (notice the timing above shows it to be 97
times slower). Avoid synchronized methods if you can – if you
can’t, synchronizing on methods rather than on code blocks is slightly
faster.
Reusing objects: It takes a
long time to create an object (the timing above shows 980 assignment times for a
new Object, and 3100 assignment times for a small new array), so
it’s often worth saving and updating the fields of an old object instead
of creating a new object. For example, rather than creating a new Font
object in your paint( ) method, you can declare it an instance
object, initialize it once, and then just update it when necessary in
paint( ). See also Bentley, Programming Pearls p. 81
[15].
Exceptions: You should only
throw exceptions in abnormal situations, which are usually cases in which
performance is not an issue since the program has run into a problem that it
doesn’t normally expect. When optimizing, combine small
try-catch blocks, which thwart compiler optimization by breaking
the code into small independent sections. On the other hand, be careful of
sacrificing the robustness of your code by over-zealous removal of exception
handling.
Hashing: The standard
Hashtable class in Java 1.0 and 1.1 requires casting and costly
synchronization (570 assignment times). Furthermore, the early JDK library
doesn’t deliberately choose prime number table sizes. Finally, a hashing
function should be designed for the particular characteristics of the keys
actually used. For all these reasons, the generic Hashtable can be
improved by designing a hash class that fits a particular application.
Note that the HashMap in the Java 1.2 collections library has much
greater flexibility and isn’t automatically
synchronized.
Method inlining: Java
compilers can inline a method only if it is final, private, or
static, and in some cases it must have no local variables. If your code
spends a lot of time calling a method that has none of these modifiers, consider
writing a version that is final.
I/O: Use buffers wherever
possible, otherwise you can end up doing I/O a single byte at a time. Note that
the JDK 1.0 I/O classes use a lot of synchronization, so you might get better
performance by using a single “bulk” call such as
readFully( ) and then interpreting the data yourself. Also notice
that the Java 1.1 “reader” and “writer” classes were
designed for improved performance.
Casts and instanceof:
Casts take from 2 to 200 assignment times. The more costly ones require
travel up the inheritance hierarchy. Other costly operations lose and restore
capabilities of the lower level constructs.
Graphics: Use clipping to
reduce the amount of work done in repaint( ), double buffering to
improve perceived speed, and image strips or compression to speed downloading
times. Animation in Java Applets from JavaWorld and Performing
Animation from Sun are two good tutorials. Remember to use high-level
primitives; it’s much faster to call drawPolygon( ) on a bunch
of points than looping with drawLine( ). If you must draw a
one-pixel-wide line, drawLine(x,y,x,y) is faster than
fillRect(x,y,1,1).
Using API classes: Use
classes from the Java API when they offer native machine performance that you
can’t match using Java. For example, arrayCopy( ) is much
faster than using a loop to copy an array of any significant size.
Replacing API classes:
Sometimes API classes do more than you need, with a corresponding increase in
execution time. For these you can write specialized versions that do less but
run faster. For example, one application that needed a container to store many
arrays was speeded by replacing the original Vector with a faster dynamic
array of objects.
Other suggestions
- Move repeated
constant calculations out of a critical loop, for example, computing
buffer.length for a constant-size
buffer.
- static
final constants can help the compiler optimize the program.
- Unroll fixed
length
loops.
- Use
javac’s optimization option, -O, which optimizes compiled code by
inlining static, final, and private methods. Note that your
classes may get larger in size (JDK 1.1 or later only – earlier versions
might not perform byte verification). Newer just-in-time (JIT) compilers will
dramatically speed the
code.
- Count
down to zero whenever possible – this uses a special JVM byte
code.
Full article...
Other Resource
... to read more articles, visit http://sqa.fyicenter.com/art/
|