Skip to content

Incremental Caching

What would happen if the source generator reprocessed the entire project every time a developer modifies a single file? In large projects with hundreds of Repository classes, the IDE would freeze for several seconds on each keystroke. While ForAttributeWithMetadataName from the previous chapter optimizes target discovery, the incremental caching in this chapter is the mechanism that reuses already-processed results to completely skip unnecessary reprocessing. For caching to work correctly, both the data model and output must be deterministic, and understanding the mistakes that break this is the key focus of this chapter.

  1. Understand the operating principles of incremental builds
    • Three stages: input change detection, intermediate result caching, output caching
  2. Identify the conditions for caching to work
    • Value equality, immutable collections, deterministic output
  3. Identify common mistakes that invalidate the cache
    • Timestamps, order non-determinism, external state dependencies

Incremental Build is a technique that shortens build time by reprocessing only changed parts.

Full Build
=====================
File A modified
|
All files reprocessed: A, B, C, D, E
|
Build time: 10 seconds
Incremental Build
============================
File A modified
|
Cache check:
- File A: Changed -> Reprocess
- File B: No change -> Use cache
- File C: No change -> Use cache
|
Build time: 2 seconds

IIncrementalGenerator automatically supports incremental caching:

Provider Pipeline Caching
=========================
1. Input change detection
- Source file hash comparison
- Re-execute pipeline only for changed files
2. Intermediate result caching
- Cache results of Select, Where, etc.
- Same input -> Return cached result
3. Output caching
- Skip file update if generated code is identical
First Build
===========
UserRepository.cs -> [Process] -> UserRepositoryObservable.g.cs
OrderRepository.cs -> [Process] -> OrderRepositoryObservable.g.cs
Second Build (only UserRepository.cs modified)
==============================================
UserRepository.cs -> [Process] -> UserRepositoryObservable.g.cs (updated)
OrderRepository.cs -> [Cache] -> (processing skipped)

For caching to work correctly, value equality is required.

// readonly record struct: value semantics + automatic Equals/GetHashCode
public readonly record struct ObservableClassInfo(
string Namespace,
string ClassName,
List<MethodInfo> Methods,
List<ParameterInfo> BaseConstructorParameters);
// Same content -> Same hash -> Cache hit
// Using ImmutableArray
public readonly record struct ObservableClassInfo(
string Namespace,
string ClassName,
ImmutableArray<MethodInfo> Methods); // Immutable
// List only does reference comparison
public readonly record struct ObservableClassInfo(
string Namespace,
string ClassName,
List<MethodInfo> Methods); // Different instances are different even with same content
// Non-deterministic: different result every time
var code = $"""
// Generated at {DateTime.Now}
public class {className}Pipeline {{ }}
""";
// Deterministic: same input -> same output
var code = $"""
// <auto-generated/>
public class {className}Pipeline {{ }}
""";

Cache is invalidated if the same type is expressed differently:

// Non-deterministic: same type may be expressed differently
string typeName = type.Name; // "User" vs "User" (varies by alias)
string typeName = type.ToDisplayString();
// "User" vs "MyApp.User" (varies by context)
// Deterministic: always the same format
string typeName = type.ToDisplayString(
SymbolDisplayFormat.FullyQualifiedFormat);
// Always "global::MyApp.Models.User"

Deterministic format definition from the Functorium project:

SymbolDisplayFormats.cs
namespace Functorium.SourceGenerators.Generators.ObservablePortGenerator;
public static class SymbolDisplayFormats
{
/// <summary>
/// Global qualified format for deterministic code generation
/// </summary>
public static readonly SymbolDisplayFormat GlobalQualifiedFormat = new(
globalNamespaceStyle: SymbolDisplayGlobalNamespaceStyle.Included,
typeQualificationStyle: SymbolDisplayTypeQualificationStyle.NameAndContainingTypesAndNamespaces,
genericsOptions: SymbolDisplayGenericsOptions.IncludeTypeParameters,
miscellaneousOptions:
SymbolDisplayMiscellaneousOptions.UseSpecialTypes |
SymbolDisplayMiscellaneousOptions.EscapeKeywordIdentifiers |
SymbolDisplayMiscellaneousOptions.IncludeNullableReferenceTypeModifier);
}
// Usage example
string typeName = type.ToDisplayString(SymbolDisplayFormats.GlobalQualifiedFormat);
// "global::System.Collections.Generic.List<global::MyApp.Models.User>"

When caching fails to work properly, it usually falls into one of these three causes. These are easy traps to fall into, so it is worth remembering each pattern.

// Including timestamp
return new ClassInfo(
Name: symbol.Name,
GeneratedAt: DateTime.Now); // Different every time!
// Order not guaranteed
var methods = symbol.GetMembers()
.OfType<IMethodSymbol>()
.ToList(); // Order may vary
// Sorted order
var methods = symbol.GetMembers()
.OfType<IMethodSymbol>()
.OrderBy(m => m.Name) // Always the same order
.ToList();
// Environment variable dependency
var debugMode = Environment.GetEnvironmentVariable("DEBUG");
// File system access
var config = File.ReadAllText("config.json");
// External resource access is restricted in source generators
// File access is only possible through AdditionalTexts

How to verify that caching is working:

Terminal window
# Build with detailed logs
dotnet build -v:diag > build.log
# Search for source generator related logs
grep -i "generator" build.log | grep -i "cache"
When cache is working
=====================
UserRepositoryObservable.g.cs Modified: 10:00:00
(File A modified)
UserRepositoryObservable.g.cs Modified: 10:00:05 (updated)
OrderRepositoryObservable.g.cs Modified: 10:00:00 (unchanged)
When cache is not working
=========================
All file modification times changed -> Suspect non-deterministic output
// Use only during development
#if DEBUG
.Select((info, _) =>
{
Console.WriteLine($"Processing: {info.ClassName}");
return info;
})
#endif

1. Filter as Much as Possible in predicate

Section titled “1. Filter as Much as Possible in predicate”
// Filtering in transform (slow)
.ForAttributeWithMetadataName(
"MyAttribute",
predicate: (_, _) => true, // All nodes pass
transform: (ctx, _) =>
{
if (ctx.TargetNode is not ClassDeclarationSyntax)
return null; // Filtering here
...
})
// Filtering in predicate (fast)
.ForAttributeWithMetadataName(
"MyAttribute",
predicate: (node, _) => node is ClassDeclarationSyntax, // Fast filter
transform: (ctx, _) => ...)
// Generating code in transform (caching inefficient)
transform: (ctx, _) => GenerateCode(ctx.TargetSymbol) // string
// Extracting only data in transform
transform: (ctx, _) => ExtractInfo(ctx.TargetSymbol) // record
// Generate code in RegisterSourceOutput
// Unnecessary Collect
var provider = ...; // IncrementalValuesProvider
context.RegisterSourceOutput(provider.Collect(), (ctx, items) =>
{
foreach (var item in items)
GenerateForItem(ctx, item);
});
// Individual processing (better for caching)
var provider = ...;
context.RegisterSourceOutput(provider, (ctx, item) =>
{
GenerateForItem(ctx, item);
});

Incremental caching is the key to source generator performance. For caching to work correctly, value equality must be guaranteed in data models, and non-deterministic elements must be removed from output. In our project, ObservableClassInfo is defined as a readonly record struct, and type names are consistently represented with SymbolDisplayFormats.GlobalQualifiedFormat to satisfy these conditions.

ItemRecommendation
Data modelUse records
CollectionsImmutableArray or sorted List
Type namesSymbolDisplayFormat.FullyQualifiedFormat
FilteringAs much as possible in predicate
OutputDeterministic (exclude timestamps, etc.)

Q1: What is the most common cause of incremental cache invalidation?

Section titled “Q1: What is the most common cause of incremental cache invalidation?”

A: Including non-deterministic values like DateTime.Now or Guid.NewGuid() in the data model, or not guaranteeing collection sort order, are the most common causes. The data model’s Equals comparison returns false on every build, causing code regeneration even without actual changes.

Q2: Why should you filter as much as possible in the predicate?

Section titled “Q2: Why should you filter as much as possible in the predicate?”

A: The predicate operates at the Syntax level, making it very low-cost. The transform, on the other hand, requires Semantic Model access, making it expensive. Reducing candidates as much as possible in the predicate decreases the number of transform executions, resulting in improved overall pipeline performance.

Q3: Wouldn’t using ImmutableArray<MethodInfo> instead of List<MethodInfo> in ObservableClassInfo be better?

Section titled “Q3: Wouldn’t using ImmutableArray<MethodInfo> instead of List<MethodInfo> in ObservableClassInfo be better?”

A: Theoretically, ImmutableArray is safer as it guarantees immutability. However, in Functorium, caching works correctly with the value equality of readonly record struct and content-based comparison of MethodInfo, and List is used for implementation simplicity.


Now that we understand the principles of incremental caching, we move on to the symbol API, the core tool for actually extracting data in the pipeline. In the next chapter, we examine how to analyze class and interface information through INamedTypeSymbol.

-> 05. INamedTypeSymbol