Two steps to builder a .NET Trimmer application

Zack Yang
5 min readMar 21, 2022

--

Ten days ago, I published an open-source program for trimming .NET Core applications, which is Zack.DotNetTrimmer. Compared with. NET Core built-in trimmer, Zack.DotNetTrimmer not only can compresses more, but also supports WPF and WinForm. The repository is https://github.com/yangzhongke/Zack.DotNetTrimmer

Many developers are interested in the principle of this open-source project, so I will introduce how it works here.

Trick 1. How to detect loaded assemblies and classes?

Microsoft provides a library, Diagnostics, which can be used to analyze the runtime behavior of NET Core, obtains rich runtime information such as instance creation, assembly loading, class loading, method invocations, GC, file read and write operations, network connections, and more. Visual Studio uses Diagnostics to evaluate the invocation time of each method.

To use library of Diagnostics, we need to install Microsoft.Diagnostics.NETCore.Client and Microsoft.Diagnostics.Tracing.TraceEvent through NuGet. DiagnosticsClient class is then used to connect to the process of .NET Core application to be analyzed. The code is as following:

using Microsoft.Diagnostics.NETCore.Client;

using Microsoft.Diagnostics.Tracing;

using Microsoft.Diagnostics.Tracing.Parsers;

using Microsoft.Diagnostics.Tracing.Parsers.Clr;

using System.Diagnostics;

using System.Diagnostics.Tracing;

string filepath = @”E:\temp\test6\ConsoleApp1.exe”;//Path of .NET Core application to be analyzed

ProcessStartInfo psInfo = new ProcessStartInfo(filepath);

psInfo.UseShellExecute = true;

using Process? p = Process.Start(psInfo);//start the application

var providers = new List<EventPipeProvider>()//events to be listened

{

new EventPipeProvider(“Microsoft-Windows-DotNETRuntime”,

EventLevel.Informational, (long)ClrTraceEventParser.Keywords.All)

};

var client = new DiagnosticsClient(p.Id);//the process of .NET Core application to be analyzed

using EventPipeSession session = client.StartEventPipeSession(providers, false);

var source = new EventPipeEventSource(session.EventStream);

source.Clr.All += (TraceEvent obj) =>

{

if (obj is ModuleLoadUnloadTraceData)//Event of assembly loading

{

var data = (ModuleLoadUnloadTraceData)obj;

string path = data.ModuleILPath;//Path of loaded assembly

Console.WriteLine($”Assembly Loaded:{path}”);

}

else if (obj is TypeLoadStopTraceData)// Event of type loading

{

var data = (TypeLoadStopTraceData)obj;

string typeName = data.TypeName;

Console.WriteLine($”Type Loaded:{typeName}”);

}

};

source.Process();

Different types of events have their specified types, and all of which are inherited from TraceEvent. As shown above, ModuleLoadUnloadTraceData is used for getting information of assembly loading, TypeLoadStopTraceData for class loading.

Trick 2. Removing unused classes from assembly

Zack.DotNetTrimmer provides the ability to remove IL from classes that are not used in an assembly, using the Dnlib library to edit assembly files. Dnlib is a library for reading, writing, editing .NET assembly.

In Dnlib, we use the method of ModuleDefMD.Load to Load an existing assembly and the return value is of ModuleDefMD type. ModuleDefMD represents assembly information, such that the Types attribute represents all Types in the assembly. We can modify the ModuleDefMD and its objects, and then call the Write method to save the modified assembly to disk.

For example, the following code changes all non-public types in an assembly to public and removes all attributes of methods:

using dnlib.DotNet;

string filename = @”E:\temp\net6.0\AppToBeTested1.dll”;

ModuleDefMD module = ModuleDefMD.Load(filename);

foreach(var typeDef in module.Types)

{

if (typeDef.IsPublic == false)

{

typeDef.Attributes |= TypeAttributes.Public;//change the modifier of method to public

}

foreach(var methodDef in typeDef.Methods)

{

methodDef.CustomAttributes.Clear();//clear all the attributes

}

}

module.Write(@”E:\temp\net6.0\1.dll”);//Save the changes

Here is the source code for the assembly to be tested:

internal class Class1

{

[DisplayName(“AAA”)]

public void AA()

{

Console.WriteLine(“hello”);

}

}

Here is the decompilation code of the modified assembly:

public class Class1

{

public void AA()

{

Console.WriteLine(“hello”);

}

}

You can see that the changes to the assembly works.

Now we know how to modify assemblies using Dnlib, we can remove Types that are not used in assemblies by simply removing them from the Types attribute of ModuleDefMD. In practice, however, this can be problematic because the class to be removed may be referenced by other code, even though those code refer to the class is not really called, but Write method of ModuleDefMD still validates the modified assembly to ensure it is valid. Otherwise, the Write method may throw a ModuleWriterException, such as:

ModuleWriterException: ‘A method was removed that is still referenced by this module.’

Therefore, we need to write code to check the assembly to make sure that every reference to the class to be deleted is removed. Because the class definition itself takes up very little file size, the main code space is taken by method bodies, so I found an alternative walk-around, which is not to delete the class, but to empty the method body of the class.

In Dnlib, the corresponding method type is MethodDef, and the Body property of the CilBody type in MethodDef represents the body of the method. If the method has a method body (that is, not an abstract method, etc.), then the CilBody.Instructions represent a collection of IL Instructions of the method body. So I use the following code to clean up the body of a method:

methodDef.Body.Instructions.Clear();

However, when the above code is used to save the ModuleDefMD, the assembly may be invalid. For example, if a method defines a return value, if we simply empty the method body, no return value will be returned. So I changed the idea, which is to change all the method bodies to ‘throw null’ , since all method bodies can be changed to throw an exception to ensure the logic is correct. So I wrote the following code to clean up the method body:

method.Body.ExceptionHandlers.Clear();

method.Body.Instructions.Clear();

method.Body.Variables.Clear();

method.Body.Instructions.Add(new Instruction(OpCodes.Nop) { Offset = 0 });

method.Body.Instructions.Add(new Instruction(OpCodes.Ldnull) { Offset = 1 });

method.Body.Instructions.Add(new Instruction(OpCodes.Throw) { Offset = 2 });

The last three lines of added IL code correspond to the C# code of ‘throw null’.

Check out the project’s Github repository for all the source code: https://github.com/yangzhongke/Zack.DotNetTrimmer

More tips with Dnlib

There are a few other things I learned from using Dnlib that I would like to share with you.

Tip 1. Dnlib has problems saving assemblies containing native code

When cleaning up assemblies using the methods I mentioned above, most of our custom assemblies and assemblies from third-party NuGet packages are fine. But when using the same method to process PresentationCore.dll, System.Private.CoreLib.dll and other foundation assemblies of .NET, problems occurred, even when we Load the assembly and then Write it without making any changes, the assembly becomes significantly smaller. For example, I use the following code to process PresentationFramework.dll of .NET Core:

using (var mod = ModuleDefMD.Load(@”E:\temp\PresentationFramework.dll”))

{

mod.Write(@”E:\temp\PresentationFramework.New.dll”);

}

The original size of PresentationFramework.dll is 15.9 MB, but the size of saved version is only 5.7 MB. After asking the Dnlib author, I learned that these assemblies contain native code (such as code written in C++/CLI or ReadyToRun/NGEN/CrossGen, etc.) that is ignored when saved using Write. This is why the saved assembly size is significantly smaller. We can use the NativeWrite method instead of the Write method because it preserves native code.

However, according to Washi1337, author of AsmResolver (an open source project similar to DnLib), the NativeWrite method tries to preserve the structure of the native code so that the assembly size cannot be reduced. Instead, it may even increase the size of the assembly (see https://github.com/Washi1337/AsmResolver/issues/267). And in the actual use, I found that after modifying these assemblies, the program would fail to start. Checking the Windows event log, I found that it was caused by CLR startup failure. According to Washi1337, if only the native code in the assembly contains ReadyToRun, we can simply remove the ILLibrary flag from the assembly. After all, the optimized assembly still preserves the original IL code. However, after I did what Washi1337 said, the program still failed to start. It is not clear why, because the assembly containing native code can not be tailored well, so I did not go into further research, welcome friends who are proficient in CLR to share experience.

Tip2. Other applications of Dnlib

Since DnLib can modify assemblies, we can use it to do a lot of things, such as changing the default behavior of the program, dude, You know what I mean. And, we can use DnLib to write our own code obfuscator or implement aspect-oriented programming (AOP) static weaving.

What other scenarios do you have in mind for DnLib? Please let me know.

--

--

Zack Yang
Zack Yang

Responses (1)