Monday, June 29, 2009

CLR Optimizations In .NET Framework 3.5 SP1

A surprising amount of work went into the core of the Microsoft .NET Framework for the .NET Framework 3.5 SP1 released in August 2008. Here, I'll provide in-depth information about the changes that we on the CLR team made to the common language runtime (CLR) and the improvements you can expect by simply running your existing CLR 2.0-based applications against this latest service pack. Most of our effort was centered on improving performance, security, and deployment of applications targeting the .NET platform.
Before delving into the details, note that you can download the .NET Framework 3.5 SP1 from the Microsoft Download Center. It is also available through Windows Update. Also, take a look at Figure 1. It answers the perennial question: Which CLR version is inside which version of the .NET Framework?

Figure 1 .NET Versions and CLR Versions

.NET Framework Version

Contains CLR Version
2.0 2.0
3.0 2.0, 3.0
3.5 2.0 SP1, 3.0 SP1, 3.5
3.5 SP1 2.0 SP2, 3.0 SP2, 3.5 SP1

Startup Performance Improvements

Improving the startup performance—in particular, the cold startup time of managed applications—was a primary focus of .NET Framework 3.5 SP1.
Managed assemblies in the .NET Framework are largely precompiled via NGen ("The Performance Benefits of NGen"), and the layout of the code and data in the NGen images has a strong impact on the startup performance of applications that use the framework. In particular, since cold startup time is typically bound by the number of pages of the image that need to be read from disk, any effort to reduce it translates into an effort to better pack the image so that only a small subset of its pages are read during startup.

To that effect, the CLR uses a profile-driven engine to optimize the layout of NGen images for assemblies in the .NET Framework. The profile data is collected by running scenarios that we believe are representative of our framework's usage, and then included as a resource in the corresponding assembly when it is being built. During NGen, if such a resource is found in an assembly, it is used to pack the "hot" code and data (those that were accessed when the training scenarios were run) together, thereby moving the "cold" code and data into their own sections in the image.
Furthermore, .NET Framework 3.5 SP1 contains several improvements to our hot/cold splitting infrastructure, which in turn improve the locality of NGen images. In particular, basic blocks in methods that contain various kinds of control flow—switch statements or unconditional branches, for example—are now rearranged such that more cold blocks can be split off and moved to the cold part of the NGen images. We also implemented a better algorithm for merging profile data from multiple training scenarios—we now prioritize packing together data/code that is accessed by the greatest number of scenarios in a given training set. Finally, to further improve spatial locality for code that is executed together temporally, the executable code in the NGen image is partitioned into RunOnce, RunMany, and RunNever sections. These improvements to our profile data collection and usage were made for both 32-bit and 64-bit platforms. As a result of these changes, there was a greater than 50% (sometimes significantly higher) reduction in the number of cold pages touched when a training

scenario was re-run on top of a trained .NET Framework.
Unfortunately, our profile data-gathering framework is still internal-only. So while this work benefits all managed applications that use .NET Framework libraries at startup, those that NGen the applications' assemblies could further optimize startup performance by using our training framework. We'd like to make this framework available at some point, so stay tuned!
Note that a somewhat unrelated change ("Strong Name Bypass") also addresses another NGen-related problem that was mentioned earlier in my article "The Performance Benefits of NGen," namely, time spent verifying strong name signatures.

JIT Optimizations

Switching gears a little, .NET Framework 3.5 SP1 also contains improvements to the quality of the code generated by the 32-bit and 64-bit JIT compilers. In particular, the 32-bit JIT can now inline method calls that involve passing, returning, or operating on structs. Prior to this release, the 32-bit JIT simply gave up trying to inline such functions because structs aren't first-class citizens in the x86 JIT and are lowered into byref pointers very early in the compilation process. Once that happens, the JIT is no longer able to identify the structs and cannot do the sort of normal optimizations, such as copy propagation, that are performed on primitive types.