This article is currently an experimental machine translation and may contain errors. If anything is unclear, please refer to the original Chinese version. I am continuously working to improve the translation.
The main content of this blog is my further improvement on Nanjing University's 2022 Software Engineering assignment VJVM, implementing a JVM in Java.
The code for the first two labs is based on the framework code by senior Github@amnore at this repository. For detailed lab instructions, visit https://amnore.github.io/VJVM/, and for video explanations of the solutions, check https://space.bilibili.com/507030405. The following assumes you have completed Lab1 and Lab2.
Starting from Lab3, the code and test cases are my own additions.
Preface
In Lab3.1, we completed object creation, enabling our JVM to support basic object-oriented features in Java.
Since the JVM takes care of memory management, programmers writing Java don’t need to manually allocate and deallocate memory like they would with malloc and free in C.
However, our current JVM only handles memory allocation — it’s in a sort of “shoot first, bury later” state. Now, we need to complete the memory deallocation part.
Memory Leaks in C
Consider the following C code:
1 | void func() { |
In the func function, we use malloc to request a block of memory of size SIZE, but before the function returns, we neither free the memory nor pass the pointer buffer back to the caller. As we know, malloc and free should come in pairs — free takes a pointer obtained from malloc and releases the memory it points to. However, in this example, when func finishes execution and its stack frame is popped, the local variable buffer is destroyed along with the frame. We lose access to buffer, and thus can no longer pass it to free. This means the SIZE-sized memory block can never be freed — a phenomenon known as a “memory leak”.
Of course, this doesn’t mean your computer’s memory is permanently ruined after running this program. When the program terminates, the operating system reclaims all memory used by the process, including the leaked block. (So if you’re writing code for an OJ, forgetting to free might not be a big deal anyway x.) In the worst case, even if the OS itself had a memory leak, due to the volatile nature of RAM, a simple reboot would fix it.
The typical consequence of memory leaks is that during long-running programs, memory usage keeps increasing until available system memory is exhausted, eventually causing errors or crashes.
Garbage Collection (GC)
In Java, we only need to use the new keyword to create objects — we don’t need to (and can’t) manually release the memory they occupy.
This is where the JVM must decide whether a variable might still be “useful”. Looking back at the C example: inside func, the programmer can read and write buffer, but once func exits, both the pointer buffer and the memory it points to become inaccessible. In this case, the memory is unreachable.
The JVM determines whether memory is “useful” based on reachability. Variables that a user might access are considered reachable, and their memory should not be freed. Conversely, unreachable memory — that which the user can no longer access — should be reclaimed by the system.
Reachability Analysis
How do we determine if an object is reachable in the JVM? Intuitively, we just need to check whether an object could possibly be accessed.
The process can be described as follows:
Define a set of
GC Rootsand consider them reachable.- Objects referenced in the virtual machine stack (local variable tables in stack frames)
- Objects referenced by constants in the method area
- Objects referenced by static fields in classes in the method area
- Objects referenced in native method stacks via JNI (Native methods) (currently not supported)
- Active threads (started but not yet terminated Java threads) (currently only one thread exists)
Starting from these reachable objects, mark all objects they reference as reachable.
Repeat step 2 until no new reachable objects can be found. At this point, all reachable objects have been identified. The remaining objects in the heap are unreachable and can be safely collected.
p.s. All our reachable objects here are strongly referenced — they won’t be collected even under OOM conditions. Besides strong references, Java also supports three other types of references: soft, weak, and phantom references.
Memory Leaks in Java
Usually, we don’t question the reliability of computing “infrastructure” like the JVM — after all, they’ve been thoroughly tested and scrutinized.
But that doesn’t mean we can offload all memory management to the JVM and forget about memory leaks.
Memory leaks in Java occur with objects that are reachable but useless.
Consider the following code:
1 | class Stack<T> { // DIY a stack |
This looks perfectly correct, but it actually causes a memory leak.
Suppose we push 100 Strings onto the stack, then pop 99 of them. We expect only one String to remain in the stack — the other 99 are useless (and indeed, inaccessible without reflection or similar tricks).
But as long as we hold a reference to the Stack object, its internal elements array is reachable, meaning all 100 Strings remain reachable and cannot be reclaimed by the JVM — leading to a memory leak.
The correct approach is to manually set popped elements to null, signaling to the JVM that these objects are no longer needed and can be collected.
Memory leaks in Java often happen in cases where long-lived objects hold references to short-lived objects.
Implementation
The JVM specification does not mandate a specific garbage collection implementation.
1 | Heap storage for objects is reclaimed by an automatic storage management system (known as a garbage collector); objects are never explicitly deallocated. The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor's system requirements. |
From our earlier analysis, implementation doesn’t seem too complicated.
In reality, however, the JVM is a critical piece of modern computing infrastructure and has undergone numerous optimizations. Today’s JVM garbage collection algorithms are extremely sophisticated.
A full explanation of modern JVM GC algorithms might take ten more labs to cover. For an overview of common GC algorithms, refer to this article.
For our JVM written in Java, we could technically cheat by delegating garbage collection to the host JVM. (You probably already figured out how.)
But for learning and demonstration purposes, I didn’t take that shortcut in Lab3.1. Instead, I stored object fields in a HashMap inside JHeap, whose lifecycle spans almost the entire runtime of our JVM. We need to manually remove unneeded object data from JHeap to free memory (so the host JVM can reclaim it).
In Lab3.2, we’ll implement the simplest mark-sweep algorithm. During runtime, our JVM will trigger garbage collection under certain conditions (e.g., when memory usage exceeds a threshold). Each GC cycle will analyze object reachability and remove unreachable objects from the heap.
We should technically also invoke the object’s finalize method. However, this JVM-dependent mechanism has been found problematic — for example, an object can even resurrect itself during finalize and “escape” GC. As a result, this feature was deprecated in Java 9, and we’ll also skip implementing finalize for now. (In theory, well-written programs shouldn’t be affected by this omission.)
Tips: While coding and debugging, you might accidentally introduce bugs that cause reachable objects to be collected. To make debugging easier, consider not deleting objects immediately. Instead, mark them as “deleted”, and whenever reading from a reference, check if the object is marked — if so, throw a WrongGCException. Using your IDE’s breakpoints, you can then trace why an object was incorrectly collected. After debugging, switch back to actually deleting objects to free memory.
After completing garbage collection, try running the original test cases. During testing, you may want to increase GC frequency or add more debug logs to evaluate your garbage collector’s behavior. You can also write additional test cases to stress-test your implementation.
If you spot any issues in Lab3.2, feel free to reach out.
This article is licensed under the CC BY-NC-SA 4.0 license.
Author: lyc8503, Article link: https://blog.lyc8503.net/en/post/njuse-jvm-lab3.2/
If this article was helpful or interesting to you, consider buy me a coffee¬_¬
Feel free to comment in English below o/