The Java Virtual MachineReaders already familiar with the Java Virtual Machine and the Java class file format may want to skip this section and proceed with section 3. Programs written in the Java language are compiled into a portable binary format called byte code. Every class is represented by a single class file containing class related data and byte code instructions. These files are loaded dynamically into an interpreter (Java Virtual Machine, aka. JVM) and executed.
Figure 1 illustrates the procedure of
compiling and executing a Java class: The source file
(
Note that the use of the general term "Java" implies in fact two meanings: on the one hand, Java as a programming language, on the other hand, the Java Virtual Machine, which is not necessarily targeted by the Java language exclusively, but may be used by other languages as well. We assume the reader to be familiar with the Java language and to have a general understanding of the Virtual Machine. Java class file formatGiving a full overview of the design issues of the Java class file format and the associated byte code instructions is beyond the scope of this paper. We will just give a brief introduction covering the details that are necessary for understanding the rest of this paper. The format of class files and the byte code instruction set are described in more detail in the Java Virtual Machine Specification. Especially, we will not deal with the security constraints that the Java Virtual Machine has to check at run-time, i.e. the byte code verifier.
Figure 2 shows a simplified example of the
contents of a Java class file: It starts with a header containing
a "magic number" (
Because all of the information needed to dynamically resolve the symbolic references to classes, fields and methods at run-time is coded with string constants, the constant pool contains in fact the largest portion of an average class file, approximately 60%. In fact, this makes the constant pool an easy target for code manipulation issues. The byte code instructions themselves just make up 12%. The right upper box shows a "zoomed" excerpt of the constant pool, while the rounded box below depicts some instructions that are contained within a method of the example class. These instructions represent the straightforward translation of the well-known statement:
System.out.println("Hello, world");
The first instruction loads the contents of the field Instructions, other data structures within the class file and constants themselves may refer to constants in the constant pool. Such references are implemented via fixed indexes encoded directly into the instructions. This is illustrated for some items of the figure emphasized with a surrounding box.
For example, the The constant pool basically holds the following types of constants: References to methods, fields and classes, strings, integers, floats, longs, and doubles. Byte code instruction setThe JVM is a stack-oriented interpreter that creates a local stack frame of fixed size for every method invocation. The size of the local stack has to be computed by the compiler. Values may also be stored intermediately in a frame area containing local variables which can be used like a set of registers. These local variables are numbered from 0 to 65535, i.e., you have a maximum of 65536 of local variables per method. The stack frames of caller and callee method are overlapping, i.e., the caller pushes arguments onto the operand stack and the called method receives them in local variables. The byte code instruction set currently consists of 212 instructions, 44 opcodes are marked as reserved and may be used for future extensions or intermediate optimizations within the Virtual Machine. The instruction set can be roughly grouped as follows:
Stack operations: Constants can be pushed onto the stack
either by loading them from the constant pool with the
Arithmetic operations: The instruction set of the Java
Virtual Machine distinguishes its operand types using different
instructions to operate on values of specific type. Arithmetic
operations starting with
Control flow: There are branch instructions like
Load and store operations for local variables like
Field access: The value of an instance field may be
retrieved with
Method invocation: Static Methods may either be called via
Object allocation: Class instances are allocated with the
Conversion and type checking: For stack operands of basic
type there exist casting operations like
Most instructions have a fixed length, but there are also some
variable-length instructions: In particular, the
We will not list all byte code instructions here, since these are explained in detail in the JVM specification. The opcode names are mostly self-explaining, so understanding the following code examples should be fairly intuitive. Method code
Non-abstract (and non-native) methods contain an attribute
"
Whenever an exception is raised during execution, the JVM performs
exception handling by looking into a table of exception
handlers. The table marks handlers, i.e., code chunks, to be
responsible for exceptions of certain types that are raised within
a given area of the byte code. When there is no appropriate
handler the exception is propagated back to the caller of the
method. The handler information is itself stored in an attribute
contained within the Byte code offsets
Targets of branch instructions like Type information
Java is a type-safe language and the information about the types
of fields, local variables, and methods is stored in so called
signatures. These are strings stored in the constant pool
and encoded in a special format. For example the argument and
return types of the
public static void main(String[] argv) are represented by the signature
([java/lang/String;)V
Classes are internally represented by strings like
Code example
The following example program prompts for a number and prints the
factorial of it. The import java.io.*; public class Factorial { private static BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); public static int fac(int n) { return (n == 0) ? 1 : n * fac(n - 1); } public static int readInt() { int n = 4711; try { System.out.print("Please enter a number> "); n = Integer.parseInt(in.readLine()); } catch (IOException e1) { System.err.println(e1); } catch (NumberFormatException e2) { System.err.println(e2); } return n; } public static void main(String[] argv) { int n = readInt(); System.out.println("Factorial of " + n + " is " + fac(n)); } } This code example typically compiles to the following chunks of byte code: 0: iload_0 1: ifne #8 4: iconst_1 5: goto #16 8: iload_0 9: iload_0 10: iconst_1 11: isub 12: invokestatic Factorial.fac (I)I (12) 15: imul 16: ireturn LocalVariable(start_pc = 0, length = 16, index = 0:int n) fac():
The method
If recursion has to continue, the arguments for the multiplication
( 0: sipush 4711 3: istore_0 4: getstatic java.lang.System.out Ljava/io/PrintStream; 7: ldc "Please enter a number> " 9: invokevirtual java.io.PrintStream.print (Ljava/lang/String;)V 12: getstatic Factorial.in Ljava/io/BufferedReader; 15: invokevirtual java.io.BufferedReader.readLine ()Ljava/lang/String; 18: invokestatic java.lang.Integer.parseInt (Ljava/lang/String;)I 21: istore_0 22: goto #44 25: astore_1 26: getstatic java.lang.System.err Ljava/io/PrintStream; 29: aload_1 30: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V 33: goto #44 36: astore_1 37: getstatic java.lang.System.err Ljava/io/PrintStream; 40: aload_1 41: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V 44: iload_0 45: ireturn Exception handler(s) = From To Handler Type 4 22 25 java.io.IOException(6) 4 22 36 NumberFormatException(10) readInt(): First the local variable
If one of the called methods (
The handler for |