Incremental Caching
Overview
Section titled “Overview”What would happen if the source generator reprocessed the entire project every time a developer modifies a single file? In large projects with hundreds of Repository classes, the IDE would freeze for several seconds on each keystroke. While ForAttributeWithMetadataName from the previous chapter optimizes target discovery, the incremental caching in this chapter is the mechanism that reuses already-processed results to completely skip unnecessary reprocessing. For caching to work correctly, both the data model and output must be deterministic, and understanding the mistakes that break this is the key focus of this chapter.
Learning Objectives
Section titled “Learning Objectives”Core Learning Objectives
Section titled “Core Learning Objectives”- Understand the operating principles of incremental builds
- Three stages: input change detection, intermediate result caching, output caching
- Identify the conditions for caching to work
- Value equality, immutable collections, deterministic output
- Identify common mistakes that invalidate the cache
- Timestamps, order non-determinism, external state dependencies
What is Incremental Build?
Section titled “What is Incremental Build?”Incremental Build is a technique that shortens build time by reprocessing only changed parts.
Full Build=====================File A modified |All files reprocessed: A, B, C, D, E |Build time: 10 seconds
Incremental Build============================File A modified |Cache check:- File A: Changed -> Reprocess- File B: No change -> Use cache- File C: No change -> Use cache |Build time: 2 secondsIIncrementalGenerator’s Caching
Section titled “IIncrementalGenerator’s Caching”IIncrementalGenerator automatically supports incremental caching:
Provider Pipeline Caching=========================
1. Input change detection - Source file hash comparison - Re-execute pipeline only for changed files
2. Intermediate result caching - Cache results of Select, Where, etc. - Same input -> Return cached result
3. Output caching - Skip file update if generated code is identicalCaching Flow
Section titled “Caching Flow”First Build===========UserRepository.cs -> [Process] -> UserRepositoryObservable.g.csOrderRepository.cs -> [Process] -> OrderRepositoryObservable.g.cs
Second Build (only UserRepository.cs modified)==============================================UserRepository.cs -> [Process] -> UserRepositoryObservable.g.cs (updated)OrderRepository.cs -> [Cache] -> (processing skipped)For Caching to Work
Section titled “For Caching to Work”For caching to work correctly, value equality is required.
1. Use Records
Section titled “1. Use Records”// readonly record struct: value semantics + automatic Equals/GetHashCodepublic readonly record struct ObservableClassInfo( string Namespace, string ClassName, List<MethodInfo> Methods, List<ParameterInfo> BaseConstructorParameters);
// Same content -> Same hash -> Cache hit2. Use Immutable Collections
Section titled “2. Use Immutable Collections”// Using ImmutableArraypublic readonly record struct ObservableClassInfo( string Namespace, string ClassName, ImmutableArray<MethodInfo> Methods); // Immutable
// List only does reference comparisonpublic readonly record struct ObservableClassInfo( string Namespace, string ClassName, List<MethodInfo> Methods); // Different instances are different even with same content3. Deterministic Output
Section titled “3. Deterministic Output”// Non-deterministic: different result every timevar code = $""" // Generated at {DateTime.Now} public class {className}Pipeline {{ }} """;
// Deterministic: same input -> same outputvar code = $""" // <auto-generated/> public class {className}Pipeline {{ }} """;Deterministic Code Generation
Section titled “Deterministic Code Generation”Type Name Determinism
Section titled “Type Name Determinism”Cache is invalidated if the same type is expressed differently:
// Non-deterministic: same type may be expressed differentlystring typeName = type.Name; // "User" vs "User" (varies by alias)
string typeName = type.ToDisplayString();// "User" vs "MyApp.User" (varies by context)
// Deterministic: always the same formatstring typeName = type.ToDisplayString( SymbolDisplayFormat.FullyQualifiedFormat);// Always "global::MyApp.Models.User"SymbolDisplayFormats Class
Section titled “SymbolDisplayFormats Class”Deterministic format definition from the Functorium project:
namespace Functorium.SourceGenerators.Generators.ObservablePortGenerator;
public static class SymbolDisplayFormats{ /// <summary> /// Global qualified format for deterministic code generation /// </summary> public static readonly SymbolDisplayFormat GlobalQualifiedFormat = new( globalNamespaceStyle: SymbolDisplayGlobalNamespaceStyle.Included, typeQualificationStyle: SymbolDisplayTypeQualificationStyle.NameAndContainingTypesAndNamespaces, genericsOptions: SymbolDisplayGenericsOptions.IncludeTypeParameters, miscellaneousOptions: SymbolDisplayMiscellaneousOptions.UseSpecialTypes | SymbolDisplayMiscellaneousOptions.EscapeKeywordIdentifiers | SymbolDisplayMiscellaneousOptions.IncludeNullableReferenceTypeModifier);}
// Usage examplestring typeName = type.ToDisplayString(SymbolDisplayFormats.GlobalQualifiedFormat);// "global::System.Collections.Generic.List<global::MyApp.Models.User>"Causes of Cache Invalidation
Section titled “Causes of Cache Invalidation”When caching fails to work properly, it usually falls into one of these three causes. These are easy traps to fall into, so it is worth remembering each pattern.
1. Non-deterministic Data
Section titled “1. Non-deterministic Data”// Including timestampreturn new ClassInfo( Name: symbol.Name, GeneratedAt: DateTime.Now); // Different every time!2. Order Dependency
Section titled “2. Order Dependency”// Order not guaranteedvar methods = symbol.GetMembers() .OfType<IMethodSymbol>() .ToList(); // Order may vary
// Sorted ordervar methods = symbol.GetMembers() .OfType<IMethodSymbol>() .OrderBy(m => m.Name) // Always the same order .ToList();3. External State Dependency
Section titled “3. External State Dependency”// Environment variable dependencyvar debugMode = Environment.GetEnvironmentVariable("DEBUG");
// File system accessvar config = File.ReadAllText("config.json");
// External resource access is restricted in source generators// File access is only possible through AdditionalTextsCaching Debugging
Section titled “Caching Debugging”How to verify that caching is working:
1. Check Build Logs
Section titled “1. Check Build Logs”# Build with detailed logsdotnet build -v:diag > build.log
# Search for source generator related logsgrep -i "generator" build.log | grep -i "cache"2. Generated File Timestamps
Section titled “2. Generated File Timestamps”When cache is working=====================UserRepositoryObservable.g.cs Modified: 10:00:00(File A modified)UserRepositoryObservable.g.cs Modified: 10:00:05 (updated)OrderRepositoryObservable.g.cs Modified: 10:00:00 (unchanged)
When cache is not working=========================All file modification times changed -> Suspect non-deterministic output3. Insert Debugging Code
Section titled “3. Insert Debugging Code”// Use only during development#if DEBUG.Select((info, _) =>{ Console.WriteLine($"Processing: {info.ClassName}"); return info;})#endifPerformance Optimization Tips
Section titled “Performance Optimization Tips”1. Filter as Much as Possible in predicate
Section titled “1. Filter as Much as Possible in predicate”// Filtering in transform (slow).ForAttributeWithMetadataName( "MyAttribute", predicate: (_, _) => true, // All nodes pass transform: (ctx, _) => { if (ctx.TargetNode is not ClassDeclarationSyntax) return null; // Filtering here ... })
// Filtering in predicate (fast).ForAttributeWithMetadataName( "MyAttribute", predicate: (node, _) => node is ClassDeclarationSyntax, // Fast filter transform: (ctx, _) => ...)2. Simplify transform Results
Section titled “2. Simplify transform Results”// Generating code in transform (caching inefficient)transform: (ctx, _) => GenerateCode(ctx.TargetSymbol) // string
// Extracting only data in transformtransform: (ctx, _) => ExtractInfo(ctx.TargetSymbol) // record// Generate code in RegisterSourceOutput3. Minimize Collect Usage
Section titled “3. Minimize Collect Usage”// Unnecessary Collectvar provider = ...; // IncrementalValuesProvidercontext.RegisterSourceOutput(provider.Collect(), (ctx, items) =>{ foreach (var item in items) GenerateForItem(ctx, item);});
// Individual processing (better for caching)var provider = ...;context.RegisterSourceOutput(provider, (ctx, item) =>{ GenerateForItem(ctx, item);});Summary at a Glance
Section titled “Summary at a Glance”Incremental caching is the key to source generator performance. For caching to work correctly, value equality must be guaranteed in data models, and non-deterministic elements must be removed from output. In our project, ObservableClassInfo is defined as a readonly record struct, and type names are consistently represented with SymbolDisplayFormats.GlobalQualifiedFormat to satisfy these conditions.
| Item | Recommendation |
|---|---|
| Data model | Use records |
| Collections | ImmutableArray or sorted List |
| Type names | SymbolDisplayFormat.FullyQualifiedFormat |
| Filtering | As much as possible in predicate |
| Output | Deterministic (exclude timestamps, etc.) |
Q1: What is the most common cause of incremental cache invalidation?
Section titled “Q1: What is the most common cause of incremental cache invalidation?”A: Including non-deterministic values like DateTime.Now or Guid.NewGuid() in the data model, or not guaranteeing collection sort order, are the most common causes. The data model’s Equals comparison returns false on every build, causing code regeneration even without actual changes.
Q2: Why should you filter as much as possible in the predicate?
Section titled “Q2: Why should you filter as much as possible in the predicate?”A: The predicate operates at the Syntax level, making it very low-cost. The transform, on the other hand, requires Semantic Model access, making it expensive. Reducing candidates as much as possible in the predicate decreases the number of transform executions, resulting in improved overall pipeline performance.
Q3: Wouldn’t using ImmutableArray<MethodInfo> instead of List<MethodInfo> in ObservableClassInfo be better?
Section titled “Q3: Wouldn’t using ImmutableArray<MethodInfo> instead of List<MethodInfo> in ObservableClassInfo be better?”A: Theoretically, ImmutableArray is safer as it guarantees immutability. However, in Functorium, caching works correctly with the value equality of readonly record struct and content-based comparison of MethodInfo, and List is used for implementation simplicity.
Now that we understand the principles of incremental caching, we move on to the symbol API, the core tool for actually extracting data in the pipeline. In the next chapter, we examine how to analyze class and interface information through INamedTypeSymbol.