This article is currently an experimental machine translation and may contain errors. If anything is unclear, please refer to the original Chinese version. I am continuously working to improve the translation.
The main content of this blog is my further improvement on Nanjing University's 2022 Software Engineering assignment, VJVM — implementing a JVM in Java.
The code for the first two labs is based on the framework code by senior Github@amnore at this repository. For detailed lab instructions, visit https://amnore.github.io/VJVM/, and for video explanations of the solutions, check https://space.bilibili.com/507030405. The following assumes you've already completed Lab1 and Lab2.
Starting from Lab3, the code and test cases are my own additions.
Preface
In the first two labs, we completed class parsing and the interpretation and execution of most bytecode instructions. Now our JVM has become Turing complete, capable of solving any computable problem.
Turing completeness is a great milestone for our virtual machine.
But… Even the BrainFuck language, made up of just eight characters, is Turing complete — and its interpreter is about 300 times easier to write than a JVM.
1 | // BrainFuck "HelloWorld" program |
Even if you argue BrainFuck is too cryptic, C is also Turing complete — and unlike Java, C can be directly compiled into machine code without needing a virtual machine.
So what sets object-oriented Java apart from procedural C? Java embraces the three pillars of object-oriented programming: encapsulation, inheritance, and polymorphism.
So far, we’ve only implemented part of polymorphism — specifically ad-hoc polymorphism, seen in Java as method overloading. To be precise, most of the work was actually done by the compiler — we’re just executing the corresponding bytecode as generated.
Without proper support for object-oriented features, our current JVM can only run code that looks suspiciously like C. It can’t handle most object-related operations. (By the way, since arrays are also objects in Java, our JVM currently can’t even create arrays…)
This is clearly unsatisfactory. So in Lab 3.1, we’ll implement class initialization, object creation, and related functionalities.
Memory Management
Let’s look at a piece of Java code.
1 | class A { |
We can rewrite this equivalently in C.
1 | typedef struct A { |
Unlike in C, where we manually call malloc and free to request and release memory from the system, in Java the programmer doesn’t manage memory manually — the JVM does it for us.
We use the new keyword to create objects, obtaining a reference, which is quite similar to a pointer in C.
But we don’t — and can’t — manually free memory. The JVM’s garbage collector (GC) automatically reclaims memory, preventing memory leaks.
Some side notes
In C, we can manipulate memory freely using pointers.
1 | *((int*) 0x12345678) = 0xabcd1234; // Modify memory at 0x12345678 to 0xabcd1234 |
In Java, we generally don’t have direct access to memory manipulation. Higher-level abstraction and encapsulation free the programmer from low-level details, making programming easier — but at the cost of some control and freedom.
Heap
In the operand stack and local variable table we’ve already implemented, we can store primitive types (int, long, float, …).
In this lab, we’ll also add support for reference types.
These are all fixed-size, bounded data types, which we store on the stack.
Objects, however, may encapsulate a lot of data and can be quite large. To keep the stack clean and efficient, we store objects in the heap, while the stack only holds references to them. A reference is similar to a pointer in C — it has a fixed size (1 Slot) and points to some data in the heap.
(Note: each thread (JThread) has its own stack, but they all share a single heap.)
Handling references is actually quite similar to handling primitive types — we treat them just as “pointers to a location”. From this, we can see that Java is strictly pass-by-value, with no such thing as pass-by-reference.
JVM Architecture
Class Initialization
In Lab1, we completed parsing and reading the contents of JClass, but we haven’t implemented class initialization yet.
Static fields in each JClass belong to the class itself, so we need to allocate space for them within JClass.
After that, we can proceed with class initialization, as described in Section 5.5 of the JVM specs.
Initialization mainly involves assigning values to static fields and executing code within static blocks.
1 | public class Test { |
Object Creation
From the earlier comparison between C and Java code, it’s clear that the actual data stored in the heap for an object consists of its instance fields (fields).
When creating an object, we just need to allocate space on the heap using the JClass as a “template”, reserving memory for all non-static fields. Calls to the constructor and its parent’s constructor are already compiled into separate invokespecial instructions.
For details, refer to the JVM specs’ description of the new instruction.
The JVM specs do not mandate how objects must be implemented at the lower level. My approach is to create a Reference class for reference types, which internally holds a JHeap and an index. Using the index, it locates the corresponding Fields in the JHeap, checks the type, and then reads or writes field values.
Virtual Method Invocation
Dynamic dispatch means the program dynamically selects the actual method implementation at runtime, rather than determining it at compile time.
1 | class A { |
Methods that can be inherited and overridden (like func) are called virtual methods. In Java, all methods are virtual by default. Only methods marked final or private are non-virtual.
To invoke a virtual method on a Java object, the invokevirtual instruction is used. The JVM specs provide a detailed description of its behavior. Following these specifications allows us to implement virtual method dispatch correctly.
Arrays
The JVM provides first-class support for arrays at the bytecode level. An array is a special kind of reference, different from regular objects. An array reference is created using the newarray instruction.
When implementing, you can create two subclasses of Reference: ObjectReference and ArrayReference, each implementing their own logic.
You’ll also need to implement related instructions to store and retrieve elements from arrayref.
Implementation
If you didn’t overly rely on Professor Liu Qin’s videos to complete the labs, you’ve probably already developed a solid understanding of the codebase through the painful debugging process of Lab1 and Lab2.
You can clone the project from here, check out the lab3 branch, and copy the testdata and src/test folders into your own JVM implementation to get the Lab3 test cases: https://github.com/lyc8503/jjvm/tree/lab3
In Lab3.1, I didn’t provide a “fill-in-the-blank” style framework. What you’re cloning is actually my completed version. My recommendation is to only use the provided test cases, not my implementation.
Since I was also learning as I went, my code may not be fully comprehensive or perfectly aligned with the specs — some details might be off, and the structure could be suboptimal (because I’m lazy and haven’t refactored it yet). I encourage you to implement it independently, without being influenced by my “framework”. If you’re truly stuck, feel free to refer to parts of my code. (I implemented the first two labs myself, and later made minor adjustments — so they might differ slightly from the demo videos.)
To pass the Lab3.1 test cases, you’ll need to at least complete the following:
Extend the existing
SlotsandOperandStackmethods to supportreferenceoperations, and implement the corresponding instructions.Perform class initialization at the appropriate time, and add storage for static variables within the class.
Create a
JHeap. Instantiate a single globalJHeapwhen the JVM starts, and implement memory allocation for objects and reference creation within it.Implement bytecode instructions. Compared to Lab2.2, you need to:
- Complete the remaining parts of Constants, Loads, Stores, and Comparisons.
- Complete all instructions under References except
invokeinterface,invokedynamic,athrow,monitorenter, andmonitorexit. - Implement
ifnullandifnonnullin the Extended category.
Exception handling is not yet required — for now, you can ignore parts where the JVM specs explicitly mention throwing exceptions. It’s better to throw UnimplementedError or AssertionError temporarily for easier future extension.
To simplify things, the test cases currently don’t include multidimensional arrays — feel free to add them yourself, PRs welcome.
The test cases also don’t thoroughly test inheritance edge cases — only basic scenarios are covered, and no interfaces or abstract classes are involved. Additional test cases are needed.
If you find any errors in Lab3.1, please let me know.
This article is licensed under the CC BY-NC-SA 4.0 license.
Author: lyc8503, Article link: https://blog.lyc8503.net/en/post/njuse-jvm-lab3.1/
If this article was helpful or interesting to you, consider buy me a coffee¬_¬
Feel free to comment in English below o/