Context Builder

This document provides a detailed overview of the Context Builder component in DeepLint, which is responsible for gathering and assembling the context used for LLM analysis.

For a high-level explanation of context building and its importance, see the Context Building page in the core concepts documentation.

Architecture

The Context Builder is implemented as a set of cooperating classes, each responsible for a specific aspect of context building:

Core Components

ContextBuilder

The ContextBuilder class is the main entry point for context building. It orchestrates the process of gathering information from various sources and assembling it into a structured context for LLM analysis.

export class ContextBuilder {
  private options: ContextBuilderOptions;
  private gitOperations: GitOperations;
  private diffParser: DiffParser;
  private fileSystemScanner: FileSystemScanner;
  private dependencyAnalyzer: DependencyAnalyzer;
  private codeStructureAnalyzer: CodeStructureAnalyzer;
  private tokenCounter: TokenCounter;
  private configOptions: ContextBuilderConfig;

  constructor(
    options: Partial<ContextBuilderOptions>,
    dependencies?: {
      gitOperations?: GitOperations;
      diffParser?: DiffParser;
      fileSystemScanner?: FileSystemScanner;
      dependencyAnalyzer?: DependencyAnalyzer;
      codeStructureAnalyzer?: CodeStructureAnalyzer;
      tokenCounter?: TokenCounter;
    },
  ) {
    // Initialize options and dependencies
  }

  async buildContext(): Promise<ContextBuildResult> {
    // Build context for LLM analysis
  }

  private async assembleContext(
    parsedDiff: ParsedDiff,
    repoStructure: RepositoryStructure,
    dependencyGraph: DependencyGraph,
    codeStructure: CodeStructure,
  ): Promise<LLMContext> {
    // Assemble context from various sources
  }

  // Other helper methods
}

Key Methods

buildContext(): The main method for building context. It orchestrates the entire process, from getting Git changes to assembling the final context.
assembleContext(): Assembles the context from the parsed diff, repository structure, dependency graph, and code structure.
buildRepositoryStructure(): Builds the repository structure for the context.
buildChangesContext(): Builds the changes context from the parsed diff.
buildRelatedFilesContext(): Builds the related files context from the parsed diff and dependency graph.

GitOperations

The GitOperations class provides methods for interacting with Git to get information about changes in the repository.

export class GitOperations {
  private repositoryRoot: string;

  constructor(repositoryRoot: string) {
    this.repositoryRoot = repositoryRoot;
  }

  async hasStagedChanges(): Promise<boolean> {
    // Check if there are staged changes
  }

  async hasUnstagedChanges(): Promise<boolean> {
    // Check if there are unstaged changes
  }

  async getStagedDiff(): Promise<string> {
    // Get the diff for staged changes
  }

  async getUnstagedDiff(): Promise<string> {
    // Get the diff for unstaged changes
  }
}

DiffParser

The DiffParser class parses Git diffs into a structured format that can be used for context building.

export class DiffParser {
  parse(diffText: string): ParsedDiff {
    // Parse the diff text into a structured format
  }

  generateSummary(parsedDiff: ParsedDiff): string {
    // Generate a summary of the changes
  }
}

FileSystemScanner

The FileSystemScanner class scans the repository to understand its structure, including directories, files, and their metadata.

export class FileSystemScanner {
  private repositoryRoot: string;

  constructor(repositoryRoot: string) {
    this.repositoryRoot = repositoryRoot;
  }

  async scanRepository(options: ScanOptions): Promise<RepositoryStructure> {
    // Scan the repository and return its structure
  }
}

DependencyAnalyzer

The DependencyAnalyzer class analyzes dependencies between files to identify relationships.

export class DependencyAnalyzer {
  private repositoryRoot: string;

  constructor(repositoryRoot: string) {
    this.repositoryRoot = repositoryRoot;
  }

  async buildDependencyGraph(files: FileInfo[]): Promise<DependencyGraph> {
    // Build a dependency graph for the files
  }
}

CodeStructureAnalyzer

The CodeStructureAnalyzer class extracts information about the structure of code files, such as functions, classes, and interfaces.

export class CodeStructureAnalyzer {
  private repositoryRoot: string;

  constructor(repositoryRoot: string) {
    this.repositoryRoot = repositoryRoot;
  }

  async analyzeCodeStructure(files: FileInfo[]): Promise<CodeStructure> {
    // Analyze the structure of code files
  }
}

TokenCounter

The TokenCounter class manages token usage and truncation to ensure the context fits within the LLM's token limits.

export class TokenCounter {
  private model: TiktokenModel;
  private maxTokens: number;
  private usedTokens: number = 0;
  private reservedTokens: number = 0;

  constructor(
    model: string = "gpt-4",
    maxTokens: number = 8192,
    reservedTokens: number = 1000,
  ) {
    // Initialize token counter
  }

  countTokens(text: string): number {
    // Count the number of tokens in a text
  }

  addTokens(count: number): void {
    // Add tokens to the used tokens count
  }

  getUsedTokens(): number {
    // Get the number of tokens used
  }

  getAvailableTokens(): number {
    // Get the number of tokens available
  }

  hasEnoughTokens(count: number): boolean {
    // Check if there are enough tokens available
  }

  reset(): void {
    // Reset the token counter
  }

  truncateToFit(text: string, maxTokens: number): string {
    // Truncate text to fit within a token limit
  }

  truncateFileContent(content: string, maxTokens: number): string {
    // Truncate a file content to fit within a token limit
  }
}

Context Structure

The context built by the Context Builder has the following structure:

export interface LLMContext {
  // Repository information
  repository: {
    name: string;
    root: string;
    structure: ContextRepositoryStructure;
  };

  // Changes information
  changes: {
    files: ContextChange[];
    summary: string;
  };

  // Related files
  relatedFiles: ContextFile[];

  // Metadata
  metadata: {
    contextSize: {
      totalTokens: number;
      changesTokens: number;
      relatedFilesTokens: number;
      structureTokens: number;
    };
    generatedAt: string;
    error?: {
      message: string;
      timestamp: string;
      phase?: string;
    };
  };
}

Repository Information

The repository information includes:

name: The name of the repository
root: The root directory of the repository
structure: The structure of the repository, including directories, files, and their metadata

Changes Information

The changes information includes:

files: An array of changed files, each with its content, diff, and type (addition, modification, or deletion)
summary: A summary of the changes

The related files are files that are related to the changed files, such as dependencies or dependents. Each related file includes:

path: The path to the file
relativePath: The path relative to the repository root
content: The content of the file
type: The type of the file (source, test, etc.)
size: The size of the file in bytes
lastModified: The last modified date of the file
dependencies: An array of file paths that this file depends on
dependents: An array of file paths that depend on this file
structure: The structure of the file, including functions, classes, interfaces, and types

Metadata

The metadata includes:

contextSize: Information about the token usage
generatedAt: The date and time when the context was generated
error: Error information if context building failed

Configuration Options

The Context Builder can be configured through the ContextBuilderOptions interface:

export interface ContextBuilderOptions {
  // Repository options
  repositoryRoot: string;

  // File options
  maxFileSize?: number; // in KB
  includePatterns?: string[];
  excludePatterns?: string[];

  // Token management
  maxTokens?: number;
  tokensPerFile?: number;

  // Git options
  useUnstagedChanges?: boolean;

  // Dependency options
  includeDependencies?: boolean;

  // Structure options
  includeStructure?: boolean;

  // File filtering options
  useGitignore?: boolean;
}

Error Handling

The Context Builder includes robust error handling to ensure that context building doesn't fail catastrophically. If an error occurs during context building, the Context Builder will:

Log the error with detailed information
Create an empty context with error information
Return the empty context with error information

This allows the rest of the application to continue functioning even if context building fails.

Implementation Notes

Dependency Injection

The Context Builder uses dependency injection to make testing easier. Each dependency can be injected through the constructor:

constructor(
  options: Partial<ContextBuilderOptions>,
  dependencies?: {
    gitOperations?: GitOperations;
    diffParser?: DiffParser;
    fileSystemScanner?: FileSystemScanner;
    dependencyAnalyzer?: DependencyAnalyzer;
    codeStructureAnalyzer?: CodeStructureAnalyzer;
    tokenCounter?: TokenCounter;
  },
) {
  // Initialize options and dependencies
}

This allows for easy mocking of dependencies during testing.

Token Management

Token management is a critical aspect of context building, as LLMs have token limits. The Context Builder uses the TokenCounter class to manage token usage and truncation.

The token counter:

Counts tokens using the tiktoken library
Tracks used and available tokens
Truncates file content when necessary
Prioritizes important parts of files (imports, exports, function signatures)

Project Type Detection

The Context Builder includes project type detection to determine if a project is JavaScript/TypeScript or another type. This is used to decide whether to run dependency analysis, as dependency analysis is only useful for JavaScript/TypeScript projects.

The project type detection is implemented in the detectProjectType utility function:

export enum ProjectType {
  JAVASCRIPT = "javascript",
  TYPESCRIPT = "typescript",
  OTHER = "other",
}

export function detectProjectType(rootDir: string): ProjectType {
  // Check for TypeScript configuration
  const hasTsConfig = existsSync(join(rootDir, "tsconfig.json"));
  if (hasTsConfig) {
    return ProjectType.TYPESCRIPT;
  }

  // Check for JavaScript project indicators
  const hasPackageJson = existsSync(join(rootDir, "package.json"));
  const hasJsFiles =
    existsSync(join(rootDir, "index.js")) ||
    existsSync(join(rootDir, "src/index.js")) ||
    existsSync(join(rootDir, "lib/index.js"));
  const hasNodeModules = existsSync(join(rootDir, "node_modules"));

  if (hasPackageJson || hasJsFiles || hasNodeModules) {
    return ProjectType.JAVASCRIPT;
  }

  // Default to other
  return ProjectType.OTHER;
}

The buildContext method in the ContextBuilder class uses this function to determine whether to run dependency analysis:

// Check if this is a JavaScript/TypeScript project
const isJsProject = isJavaScriptOrTypeScriptProject(this.options.repositoryRoot);

// Build dependency graph only for JavaScript/TypeScript projects
let dependencyGraph: DependencyGraph;
if (isJsProject && this.options.includeDependencies) {
  logger.info("JavaScript/TypeScript project detected. Building dependency graph...");
  dependencyGraph = await this.dependencyAnalyzer.buildDependencyGraph(
    repoStructure.allFiles,
  );
} else {
  if (!isJsProject) {
    logger.info(
      "Non-JavaScript/TypeScript project detected. Skipping dependency analysis.",
    );
  } else if (!this.options.includeDependencies) {
    logger.info("Dependency analysis disabled in configuration. Skipping.");
  }

  // Create empty dependency graph
  dependencyGraph = {
    files: new Map(),
    getDirectDependencies: () => [],
    getDirectDependents: () => [],
    getImmediateDependenciesOfChangedFiles: () => [],
  };
}

Performance Considerations

Context building can be resource-intensive, especially for large repositories. The Context Builder includes several optimizations to improve performance:

File Filtering: Only relevant files are included in the context
Token Management: File content is truncated to fit within token limits
Smart Prioritization: Files are prioritized based on their importance when token limits are reached
Project Type Detection: Dependency analysis is only run for JavaScript/TypeScript projects
Parallel Processing: Some operations are performed in parallel to improve performance

For more detailed information about the individual components of the Context Builder, the following pages are planned:

File System Scanner (Planned)
Dependency Analyzer (Planned)
Code Structure Analyzer (Planned)

PreviousData Flow NextLLM Module

Last updated 2 months ago