Who says .NET doesn’t have GC tuning, changing one line of code made less memory consumption
It’s common to see .Net developers tease: “Why are Java developers always learning about JVM tuning? That’s because Java sucks! We don’t need that at .NET!” Or is it? Today I will use a case to analyze.
Yesterday, a student asked me a question: He built a default ASP. Net Core Web API project, which is the default project template for WeatherForecast, and changed the default code for generating 5 pieces of data to generating 150,000 pieces of data. The code is as follows:
public IEnumerable<WeatherForecast> Get()
{
return Enumerable.Range(1, 150000).Select(index => new WeatherForecast
{
Date = DateOnly.FromDateTime(DateTime.Now.AddDays(index)),
TemperatureC = Random.Shared.Next(-20, 55),
Summary = Summaries[Random.Shared.Next(Summaries.Length)]
})
.ToArray();
}
And then he used a stress test tool to test the Web API with 1000 concurrent requests and found that memory soared to 7GB and didn’t drop back after the stress test. For a Web API project written in Python with the same stress test, he applied the same number of requests for the Web API written in Python, and found that the memory also soared, but after the stress test, the memory usage quickly fell back to the normal level.
He wondered, “Does such a simple program have a memory leak? Is .NET performance that bad?”
I “solved” his problem in four ways, and I will analyze the methods and principles of these ways in turn. Before I do that, let me briefly explain the basics of garbage collection (GC) :
When an object is created, it occupies memory. We must release the memory occupied by the object after it is no longer needed to prevent the program from becoming more and more memory occupied. In C, the programmer is required to use malloc() for memory allocation and free() for memory release. However, in modern programming languages such as C#, Java and Python, programmers rarely need to care about memory management. Programmers just need to create new objects as needed. Garbage Collector (GC) will help us release the objects we don’t need.
Regarding GC, there are also problems such as “generation 0, generation 1”. You can check the .NET official documentation for more information:
https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/?WT.mc_id=DT-MVP-5004444
Let’s start with these “solutions.”
Solution 1: remove ToArray()
How: The return value of Get() method is of IEnumerable<WeatherForecast>, and the Select() method returns the same type, so there was no need to convert it to an array using ToArray() , so we dropped the ToArray(). The code is as follows:
public IEnumerable<WeatherForecast> Get()
{
return Enumerable.Range(1, 150000).Select(index => new WeatherForecast
{
Date = DateOnly.FromDateTime(DateTime.Now.AddDays(index)),
TemperatureC = Random.Shared.Next(-20, 55),
Summary = Summaries[Random.Shared.Next(Summaries.Length)]
});
}
Run the same stress test again, and something amazing happens: the peak memory usage is less than 100MB.
Why:
IEnumerable and LINQ work in a “pipeline” way by default, In other words, a consumer who uses IEnumerable (in this case, a Json serializer) calls MoveNext() once for a single piece of data and then performs a Select() to create a new WeatherForecast object. In contrast, with ToArray(), 150,000 WeatherForecast objects are generated at once, and put into an array before the large array is returned.
Without ToArray(), objects are generated and consumed one by one. Therefore, objects are generated concurrently in a “slow flow”, so there is no ToArray() operation of gradually accumulating 150,000 objects, so the concurrent memory consumption is smaller. At the same time, WeatherForecast objects are produced and consumed in “pipeline” mode, so when a WeatherForecast object is consumed, it is “ready” to be collected by GC. With ToArray(), array objects hold references to 150,000 WeatherForecast objects, so only if the array is marked “recyclable” can those 150,000 WeatherForecast objects be marked “recyclable”. As a result, the chance to retrieve WeatherForecast objects is greatly delayed.
I don’t know why Microsoft has given unnecessary ToArray() in the WeatherForecast Web API example project code. I will go to Microsoft to give feedback, and no one can stop me!
In conclusion: In order to “pipeline” Linq, use an IEnumerable instead of an array or List, and be careful of ToArray() or ToList() every time you use an IEnumerable.
The solution is the most perfect one, and the following solutions are just to help you understand GC more deeply.
Solution 2: change ‘class’ to ‘struct’
How: Keep the original ToArray(), but change the WeatherForecast from ‘class’ to ‘struct’ as follows:
public struct WeatherForecast
{
public DateOnly Date { get; set; }
public int TemperatureC { get; set; }
public int TemperatureF => 32 + (int)(TemperatureC / 0.5556);
public string? Summary { get; set; }
}
When the same stress test was run again, the peak memory footprint with the struct was only about half that with the class. Again, the memory footprint did not drop after the stress test.
Why: class objects contain more information than structs, and structs have a more compact memory structure, so structs containing the same members take up less memory than class objects. Therefore the peak memory footprint is reduced after changing the class to the struct.
You may ask “Are struct objects allocated on the stack, and are they released after used? Why didn’t the memory footprint drop after the stress test? Isn’t the struct’s memory automatically released?”. It should be noted that “struct objects are automatically released without GC” only occurs when struct objects are not referenced by reference type objects. Once a struct object is referenced by a reference type object, struct objects also need to be collected by GC. Because of the ToArray() operation in our code, 150,000 struct objects are referred to by an array, so they must be collected by GC.
Solution 3: Invoke GC manually
How: Since the memory consumption is high after the stress test because the GC is not executed in time, we can manually invoke GC after the stress test to invoke garbage collection forcefully.
Let’s create a new Controller and then call GC.Collect() from the Action to force the GC. The code is as follows:
public class ValuesController : ControllerBase
{
[HttpGet(Name = “RunGC”)]
public string RunGC()
{
GC.Collect();
return “ok”;
}
}
We then performed the stress test, and after the stress test was complete, it was clear that the memory footprint did not drop. We then requested RunGC() a few more times, and we can see that the memory footprint fell back to about 100 MB.
Why: GC.Collect() forces garbage collection, so that WeatherForecast objects will be releasd. Why does GC.Collect() called multiple times before memory usage goes back to the original state? That’s because memory collection is a CPU-consuming operation. To avoid affecting program performance, garbage collection does not recycle all unused objects at once.
It is noticeable that it is not good to call GC.Collect () manually, because GC will choose the appropriate time to perform memory collection, which may cause performance problems. If you need to manually collect GC.Collect() to reduce your program’s memory footprint to your expectations, either your program needs to be optimized or your expectations of the program’s memory footprint are wrong. What do I mean, “The expectation of a program’s memory footprint is wrong”? please check out the following solution.
Solution 4: Change type of GC
How: Add the following configuration into the ASP.NET Core project file (*.csproj file):
<PropertyGroup>
<ServerGarbageCollection>false</ServerGarbageCollection>
</PropertyGroup>
The same stress test was run again, and the memory footprint quickly fell back to the original 100MB+.
Why: As we know, the programs we develop often fall into two categories: desktop applications (e.g., WinForms, WPF) and server-side applications (e.g., ASP.NET Core).
Desktop programs generally don’t hog the memory and CPU resources of the entire operating system because there are many other programs running on the operating system, so desktop programs are conservative in their memory and CPU usage. For a desktop program, if it takes up too much memory, we think it’s bad.
Desktop programs generally don’t monopoly the memory and CPU resources of the entire operating system because there are many other programs running on the operating system, so desktop programs are conservative in their memory and CPU usage. For a desktop program, if it takes up too much memory, we think it’s bad.
In contrast, server-side programs usually have the memory and CPU resources of the entire server (because a normal system will deploy the database server, web server, Redis server on to different computers), so the full use of memory and CPU can improve the performance of web applications. Therefore the Oracle database will try to take up most of the server’s memory by default, which can be used to improve performance. If a web application underuses the memory, it may not have the best performance.
In contrast, there are two modes of.NET GC: Workstation and Server. The Workstation mode is for desktop applications with a more conservative memory footprint, while the Server mode is for server-side applications with a more aggressive memory footprint. We know that garbage collection is resource-intensive, and frequent GC can degrade performance for server-side applications, so in server mode, .NET tries to minimize the frequency and scope of GC as long as there is enough memory available. Desktop programs have a high tolerance for the performance impact of GC and a low tolerance for excessive memory footprint. Therefore, GC runs at higher frequencies in Workstation mode, thus keeping the program memory footprint low. In Server mode, if there is enough memory available, GC runs as little as possible and does not collect a large number of objects for a long time. Of course, there are many other differences between the two models, as detailed in Microsoft’s documentation:
ASP.NET Core programs are enabled with Server mode GC by default, so memory did not fall back after the stress test. After disabling Server mode via <ServerGarbageCollection>false</ServerGarbageCollection>, GC becomes Workstation mode and the program will recycle memory more aggressively. Of course, when you change a server-side program to Workstation mode, the performance of the program will suffer, so it is not recommended unless there is a good reason to do so, as idle memory is a waste to the server.
In addition to the GC type, a variety of complex GC tuning parameters, such as heap memory size and percentage, can be used in .NET just like in Java’s JVM. Please read the Microsoft documentation for details
Summary: Try to use LINQ’s “pipelined” operation and avoid ToArray() or ToList() for data sources with large amounts of data; Avoid manual GC; Setting the right expectations for a program’s memory footprint is not always better for server-side programs; Make good use of GC mode to meet the performance and memory usage of different programs; The performance of the program can be more personalized through GC tunning.