Incremental Caching

Overview

What would happen if the source generator reprocessed the entire project every time a developer modifies a single file? In large projects with hundreds of Repository classes, the IDE would freeze for several seconds on each keystroke. While ForAttributeWithMetadataName from the previous chapter optimizes target discovery, the incremental caching in this chapter is the mechanism that reuses already-processed results to completely skip unnecessary reprocessing. For caching to work correctly, both the data model and output must be deterministic, and understanding the mistakes that break this is the key focus of this chapter.

Learning Objectives

Core Learning Objectives

Understand the operating principles of incremental builds
- Three stages: input change detection, intermediate result caching, output caching
Identify the conditions for caching to work
- Value equality, immutable collections, deterministic output
Identify common mistakes that invalidate the cache
- Timestamps, order non-determinism, external state dependencies

What is Incremental Build?

Incremental Build is a technique that shortens build time by reprocessing only changed parts.

Full Build
=====================
File A modified
    |
All files reprocessed: A, B, C, D, E
    |
Build time: 10 seconds

Incremental Build
============================
File A modified
    |
Cache check:
- File A: Changed -> Reprocess
- File B: No change -> Use cache
- File C: No change -> Use cache
    |
Build time: 2 seconds

IIncrementalGenerator’s Caching

IIncrementalGenerator automatically supports incremental caching:

Provider Pipeline Caching
=========================

1. Input change detection
   - Source file hash comparison
   - Re-execute pipeline only for changed files

2. Intermediate result caching
   - Cache results of Select, Where, etc.
   - Same input -> Return cached result

3. Output caching
   - Skip file update if generated code is identical

Caching Flow

First Build
===========
UserRepository.cs   -> [Process] -> UserRepositoryObservable.g.cs
OrderRepository.cs  -> [Process] -> OrderRepositoryObservable.g.cs

Second Build (only UserRepository.cs modified)
==============================================
UserRepository.cs   -> [Process] -> UserRepositoryObservable.g.cs (updated)
OrderRepository.cs  -> [Cache]  -> (processing skipped)

For Caching to Work

For caching to work correctly, value equality is required.

1. Use Records

// readonly record struct: value semantics + automatic Equals/GetHashCode
public readonly record struct ObservableClassInfo(
    string Namespace,
    string ClassName,
    List<MethodInfo> Methods,
    List<ParameterInfo> BaseConstructorParameters);

// Same content -> Same hash -> Cache hit

2. Use Immutable Collections

// Using ImmutableArray
public readonly record struct ObservableClassInfo(
    string Namespace,
    string ClassName,
    ImmutableArray<MethodInfo> Methods);  // Immutable

// List only does reference comparison
public readonly record struct ObservableClassInfo(
    string Namespace,
    string ClassName,
    List<MethodInfo> Methods);  // Different instances are different even with same content

3. Deterministic Output

// Non-deterministic: different result every time
var code = $"""
    // Generated at {DateTime.Now}
    public class {className}Pipeline {{ }}
    """;

// Deterministic: same input -> same output
var code = $"""
    // <auto-generated/>
    public class {className}Pipeline {{ }}
    """;

Deterministic Code Generation

Type Name Determinism

Cache is invalidated if the same type is expressed differently:

// Non-deterministic: same type may be expressed differently
string typeName = type.Name;  // "User" vs "User" (varies by alias)

string typeName = type.ToDisplayString();
// "User" vs "MyApp.User" (varies by context)

// Deterministic: always the same format
string typeName = type.ToDisplayString(
    SymbolDisplayFormat.FullyQualifiedFormat);
// Always "global::MyApp.Models.User"

SymbolDisplayFormats Class

Deterministic format definition from the Functorium project:

namespace Functorium.SourceGenerators.Generators.ObservablePortGenerator;

public static class SymbolDisplayFormats
{
    /// <summary>
    /// Global qualified format for deterministic code generation
    /// </summary>
    public static readonly SymbolDisplayFormat GlobalQualifiedFormat = new(
        globalNamespaceStyle: SymbolDisplayGlobalNamespaceStyle.Included,
        typeQualificationStyle: SymbolDisplayTypeQualificationStyle.NameAndContainingTypesAndNamespaces,
        genericsOptions: SymbolDisplayGenericsOptions.IncludeTypeParameters,
        miscellaneousOptions:
            SymbolDisplayMiscellaneousOptions.UseSpecialTypes |
            SymbolDisplayMiscellaneousOptions.EscapeKeywordIdentifiers |
            SymbolDisplayMiscellaneousOptions.IncludeNullableReferenceTypeModifier);
}

// Usage example
string typeName = type.ToDisplayString(SymbolDisplayFormats.GlobalQualifiedFormat);
// "global::System.Collections.Generic.List<global::MyApp.Models.User>"

Causes of Cache Invalidation

When caching fails to work properly, it usually falls into one of these three causes. These are easy traps to fall into, so it is worth remembering each pattern.

1. Non-deterministic Data

// Including timestamp
return new ClassInfo(
    Name: symbol.Name,
    GeneratedAt: DateTime.Now);  // Different every time!

2. Order Dependency

// Order not guaranteed
var methods = symbol.GetMembers()
    .OfType<IMethodSymbol>()
    .ToList();  // Order may vary

// Sorted order
var methods = symbol.GetMembers()
    .OfType<IMethodSymbol>()
    .OrderBy(m => m.Name)  // Always the same order
    .ToList();

3. External State Dependency

// Environment variable dependency
var debugMode = Environment.GetEnvironmentVariable("DEBUG");

// File system access
var config = File.ReadAllText("config.json");

// External resource access is restricted in source generators
// File access is only possible through AdditionalTexts

Caching Debugging

How to verify that caching is working:

1. Check Build Logs

# Build with detailed logs
dotnet build -v:diag > build.log

# Search for source generator related logs
grep -i "generator" build.log | grep -i "cache"

2. Generated File Timestamps

When cache is working
=====================
UserRepositoryObservable.g.cs  Modified: 10:00:00
(File A modified)
UserRepositoryObservable.g.cs  Modified: 10:00:05 (updated)
OrderRepositoryObservable.g.cs Modified: 10:00:00 (unchanged)

When cache is not working
=========================
All file modification times changed -> Suspect non-deterministic output

3. Insert Debugging Code

// Use only during development
#if DEBUG
.Select((info, _) =>
{
    Console.WriteLine($"Processing: {info.ClassName}");
    return info;
})
#endif

Performance Optimization Tips

1. Filter as Much as Possible in predicate

// Filtering in transform (slow)
.ForAttributeWithMetadataName(
    "MyAttribute",
    predicate: (_, _) => true,  // All nodes pass
    transform: (ctx, _) =>
    {
        if (ctx.TargetNode is not ClassDeclarationSyntax)
            return null;  // Filtering here
        ...
    })

// Filtering in predicate (fast)
.ForAttributeWithMetadataName(
    "MyAttribute",
    predicate: (node, _) => node is ClassDeclarationSyntax,  // Fast filter
    transform: (ctx, _) => ...)

2. Simplify transform Results

// Generating code in transform (caching inefficient)
transform: (ctx, _) => GenerateCode(ctx.TargetSymbol)  // string

// Extracting only data in transform
transform: (ctx, _) => ExtractInfo(ctx.TargetSymbol)  // record
// Generate code in RegisterSourceOutput

3. Minimize Collect Usage

// Unnecessary Collect
var provider = ...; // IncrementalValuesProvider
context.RegisterSourceOutput(provider.Collect(), (ctx, items) =>
{
    foreach (var item in items)
        GenerateForItem(ctx, item);
});

// Individual processing (better for caching)
var provider = ...;
context.RegisterSourceOutput(provider, (ctx, item) =>
{
    GenerateForItem(ctx, item);
});

Summary at a Glance

Incremental caching is the key to source generator performance. For caching to work correctly, value equality must be guaranteed in data models, and non-deterministic elements must be removed from output. In our project, ObservableClassInfo is defined as a readonly record struct, and type names are consistently represented with SymbolDisplayFormats.GlobalQualifiedFormat to satisfy these conditions.

Item	Recommendation
Data model	Use records
Collections	ImmutableArray or sorted List
Type names	SymbolDisplayFormat.FullyQualifiedFormat
Filtering	As much as possible in predicate
Output	Deterministic (exclude timestamps, etc.)

FAQ

Q1: What is the most common cause of incremental cache invalidation?

A: Including non-deterministic values like DateTime.Now or Guid.NewGuid() in the data model, or not guaranteeing collection sort order, are the most common causes. The data model’s Equals comparison returns false on every build, causing code regeneration even without actual changes.

Q2: Why should you filter as much as possible in the `predicate`?

A: The predicate operates at the Syntax level, making it very low-cost. The transform, on the other hand, requires Semantic Model access, making it expensive. Reducing candidates as much as possible in the predicate decreases the number of transform executions, resulting in improved overall pipeline performance.

Q3: Wouldn’t using `ImmutableArray<MethodInfo>` instead of `List<MethodInfo>` in `ObservableClassInfo` be better?

A: Theoretically, ImmutableArray is safer as it guarantees immutability. However, in Functorium, caching works correctly with the value equality of readonly record struct and content-based comparison of MethodInfo, and List is used for implementation simplicity.

Now that we understand the principles of incremental caching, we move on to the symbol API, the core tool for actually extracting data in the pipeline. In the next chapter, we examine how to analyze class and interface information through INamedTypeSymbol.

-> 05. INamedTypeSymbol