# Context Builder

This document provides a detailed overview of the Context Builder component in DeepLint, which is responsible for gathering and assembling the context used for LLM analysis.

{% hint style="info" %}
For a high-level explanation of context building and its importance, see the [Context Building](https://docs.deeplint.com/core-concepts/context-building) page in the core concepts documentation.
{% endhint %}

## Architecture

The Context Builder is implemented as a set of cooperating classes, each responsible for a specific aspect of context building:

## Core Components

### ContextBuilder

The `ContextBuilder` class is the main entry point for context building. It orchestrates the process of gathering information from various sources and assembling it into a structured context for LLM analysis.

```typescript
export class ContextBuilder {
  private options: ContextBuilderOptions;
  private gitOperations: GitOperations;
  private diffParser: DiffParser;
  private fileSystemScanner: FileSystemScanner;
  private dependencyAnalyzer: DependencyAnalyzer;
  private codeStructureAnalyzer: CodeStructureAnalyzer;
  private tokenCounter: TokenCounter;
  private configOptions: ContextBuilderConfig;

  constructor(
    options: Partial<ContextBuilderOptions>,
    dependencies?: {
      gitOperations?: GitOperations;
      diffParser?: DiffParser;
      fileSystemScanner?: FileSystemScanner;
      dependencyAnalyzer?: DependencyAnalyzer;
      codeStructureAnalyzer?: CodeStructureAnalyzer;
      tokenCounter?: TokenCounter;
    },
  ) {
    // Initialize options and dependencies
  }

  async buildContext(): Promise<ContextBuildResult> {
    // Build context for LLM analysis
  }

  private async assembleContext(
    parsedDiff: ParsedDiff,
    repoStructure: RepositoryStructure,
    dependencyGraph: DependencyGraph,
    codeStructure: CodeStructure,
  ): Promise<LLMContext> {
    // Assemble context from various sources
  }

  // Other helper methods
}
```

#### Key Methods

* **buildContext()**: The main method for building context. It orchestrates the entire process, from getting Git changes to assembling the final context.
* **assembleContext()**: Assembles the context from the parsed diff, repository structure, dependency graph, and code structure.
* **buildRepositoryStructure()**: Builds the repository structure for the context.
* **buildChangesContext()**: Builds the changes context from the parsed diff.
* **buildRelatedFilesContext()**: Builds the related files context from the parsed diff and dependency graph.

### GitOperations

The `GitOperations` class provides methods for interacting with Git to get information about changes in the repository.

```typescript
export class GitOperations {
  private repositoryRoot: string;

  constructor(repositoryRoot: string) {
    this.repositoryRoot = repositoryRoot;
  }

  async hasStagedChanges(): Promise<boolean> {
    // Check if there are staged changes
  }

  async hasUnstagedChanges(): Promise<boolean> {
    // Check if there are unstaged changes
  }

  async getStagedDiff(): Promise<string> {
    // Get the diff for staged changes
  }

  async getUnstagedDiff(): Promise<string> {
    // Get the diff for unstaged changes
  }
}
```

### DiffParser

The `DiffParser` class parses Git diffs into a structured format that can be used for context building.

```typescript
export class DiffParser {
  parse(diffText: string): ParsedDiff {
    // Parse the diff text into a structured format
  }

  generateSummary(parsedDiff: ParsedDiff): string {
    // Generate a summary of the changes
  }
}
```

### FileSystemScanner

The `FileSystemScanner` class scans the repository to understand its structure, including directories, files, and their metadata.

```typescript
export class FileSystemScanner {
  private repositoryRoot: string;

  constructor(repositoryRoot: string) {
    this.repositoryRoot = repositoryRoot;
  }

  async scanRepository(options: ScanOptions): Promise<RepositoryStructure> {
    // Scan the repository and return its structure
  }
}
```

### DependencyAnalyzer

The `DependencyAnalyzer` class analyzes dependencies between files to identify relationships.

```typescript
export class DependencyAnalyzer {
  private repositoryRoot: string;

  constructor(repositoryRoot: string) {
    this.repositoryRoot = repositoryRoot;
  }

  async buildDependencyGraph(files: FileInfo[]): Promise<DependencyGraph> {
    // Build a dependency graph for the files
  }
}
```

### CodeStructureAnalyzer

The `CodeStructureAnalyzer` class extracts information about the structure of code files, such as functions, classes, and interfaces.

```typescript
export class CodeStructureAnalyzer {
  private repositoryRoot: string;

  constructor(repositoryRoot: string) {
    this.repositoryRoot = repositoryRoot;
  }

  async analyzeCodeStructure(files: FileInfo[]): Promise<CodeStructure> {
    // Analyze the structure of code files
  }
}
```

### TokenCounter

The `TokenCounter` class manages token usage and truncation to ensure the context fits within the LLM's token limits.

```typescript
export class TokenCounter {
  private model: TiktokenModel;
  private maxTokens: number;
  private usedTokens: number = 0;
  private reservedTokens: number = 0;

  constructor(
    model: string = "gpt-4",
    maxTokens: number = 8192,
    reservedTokens: number = 1000,
  ) {
    // Initialize token counter
  }

  countTokens(text: string): number {
    // Count the number of tokens in a text
  }

  addTokens(count: number): void {
    // Add tokens to the used tokens count
  }

  getUsedTokens(): number {
    // Get the number of tokens used
  }

  getAvailableTokens(): number {
    // Get the number of tokens available
  }

  hasEnoughTokens(count: number): boolean {
    // Check if there are enough tokens available
  }

  reset(): void {
    // Reset the token counter
  }

  truncateToFit(text: string, maxTokens: number): string {
    // Truncate text to fit within a token limit
  }

  truncateFileContent(content: string, maxTokens: number): string {
    // Truncate a file content to fit within a token limit
  }
}
```

## Context Structure

The context built by the Context Builder has the following structure:

```typescript
export interface LLMContext {
  // Repository information
  repository: {
    name: string;
    root: string;
    structure: ContextRepositoryStructure;
  };

  // Changes information
  changes: {
    files: ContextChange[];
    summary: string;
  };

  // Related files
  relatedFiles: ContextFile[];

  // Metadata
  metadata: {
    contextSize: {
      totalTokens: number;
      changesTokens: number;
      relatedFilesTokens: number;
      structureTokens: number;
    };
    generatedAt: string;
    error?: {
      message: string;
      timestamp: string;
      phase?: string;
    };
  };
}
```

### Repository Information

The repository information includes:

* **name**: The name of the repository
* **root**: The root directory of the repository
* **structure**: The structure of the repository, including directories, files, and their metadata

### Changes Information

The changes information includes:

* **files**: An array of changed files, each with its content, diff, and type (addition, modification, or deletion)
* **summary**: A summary of the changes

### Related Files

The related files are files that are related to the changed files, such as dependencies or dependents. Each related file includes:

* **path**: The path to the file
* **relativePath**: The path relative to the repository root
* **content**: The content of the file
* **type**: The type of the file (source, test, etc.)
* **size**: The size of the file in bytes
* **lastModified**: The last modified date of the file
* **dependencies**: An array of file paths that this file depends on
* **dependents**: An array of file paths that depend on this file
* **structure**: The structure of the file, including functions, classes, interfaces, and types

### Metadata

The metadata includes:

* **contextSize**: Information about the token usage
* **generatedAt**: The date and time when the context was generated
* **error**: Error information if context building failed

## Configuration Options

The Context Builder can be configured through the `ContextBuilderOptions` interface:

```typescript
export interface ContextBuilderOptions {
  // Repository options
  repositoryRoot: string;

  // File options
  maxFileSize?: number; // in KB
  includePatterns?: string[];
  excludePatterns?: string[];

  // Token management
  maxTokens?: number;
  tokensPerFile?: number;

  // Git options
  useUnstagedChanges?: boolean;

  // Dependency options
  includeDependencies?: boolean;

  // Structure options
  includeStructure?: boolean;

  // File filtering options
  useGitignore?: boolean;
}
```

## Error Handling

The Context Builder includes robust error handling to ensure that context building doesn't fail catastrophically. If an error occurs during context building, the Context Builder will:

1. Log the error with detailed information
2. Create an empty context with error information
3. Return the empty context with error information

This allows the rest of the application to continue functioning even if context building fails.

## Implementation Notes

### Dependency Injection

The Context Builder uses dependency injection to make testing easier. Each dependency can be injected through the constructor:

```typescript
constructor(
  options: Partial<ContextBuilderOptions>,
  dependencies?: {
    gitOperations?: GitOperations;
    diffParser?: DiffParser;
    fileSystemScanner?: FileSystemScanner;
    dependencyAnalyzer?: DependencyAnalyzer;
    codeStructureAnalyzer?: CodeStructureAnalyzer;
    tokenCounter?: TokenCounter;
  },
) {
  // Initialize options and dependencies
}
```

This allows for easy mocking of dependencies during testing.

### Token Management

Token management is a critical aspect of context building, as LLMs have token limits. The Context Builder uses the `TokenCounter` class to manage token usage and truncation.

The token counter:

1. Counts tokens using the tiktoken library
2. Tracks used and available tokens
3. Truncates file content when necessary
4. Prioritizes important parts of files (imports, exports, function signatures)

### Project Type Detection

The Context Builder includes project type detection to determine if a project is JavaScript/TypeScript or another type. This is used to decide whether to run dependency analysis, as dependency analysis is only useful for JavaScript/TypeScript projects.

The project type detection is implemented in the `detectProjectType` utility function:

```typescript
export enum ProjectType {
  JAVASCRIPT = "javascript",
  TYPESCRIPT = "typescript",
  OTHER = "other",
}

export function detectProjectType(rootDir: string): ProjectType {
  // Check for TypeScript configuration
  const hasTsConfig = existsSync(join(rootDir, "tsconfig.json"));
  if (hasTsConfig) {
    return ProjectType.TYPESCRIPT;
  }

  // Check for JavaScript project indicators
  const hasPackageJson = existsSync(join(rootDir, "package.json"));
  const hasJsFiles =
    existsSync(join(rootDir, "index.js")) ||
    existsSync(join(rootDir, "src/index.js")) ||
    existsSync(join(rootDir, "lib/index.js"));
  const hasNodeModules = existsSync(join(rootDir, "node_modules"));

  if (hasPackageJson || hasJsFiles || hasNodeModules) {
    return ProjectType.JAVASCRIPT;
  }

  // Default to other
  return ProjectType.OTHER;
}
```

The `buildContext` method in the `ContextBuilder` class uses this function to determine whether to run dependency analysis:

```typescript
// Check if this is a JavaScript/TypeScript project
const isJsProject = isJavaScriptOrTypeScriptProject(this.options.repositoryRoot);

// Build dependency graph only for JavaScript/TypeScript projects
let dependencyGraph: DependencyGraph;
if (isJsProject && this.options.includeDependencies) {
  logger.info("JavaScript/TypeScript project detected. Building dependency graph...");
  dependencyGraph = await this.dependencyAnalyzer.buildDependencyGraph(
    repoStructure.allFiles,
  );
} else {
  if (!isJsProject) {
    logger.info(
      "Non-JavaScript/TypeScript project detected. Skipping dependency analysis.",
    );
  } else if (!this.options.includeDependencies) {
    logger.info("Dependency analysis disabled in configuration. Skipping.");
  }

  // Create empty dependency graph
  dependencyGraph = {
    files: new Map(),
    getDirectDependencies: () => [],
    getDirectDependents: () => [],
    getImmediateDependenciesOfChangedFiles: () => [],
  };
}
```

### Performance Considerations

Context building can be resource-intensive, especially for large repositories. The Context Builder includes several optimizations to improve performance:

1. **File Filtering**: Only relevant files are included in the context
2. **Token Management**: File content is truncated to fit within token limits
3. **Smart Prioritization**: Files are prioritized based on their importance when token limits are reached
4. **Project Type Detection**: Dependency analysis is only run for JavaScript/TypeScript projects
5. **Parallel Processing**: Some operations are performed in parallel to improve performance

{% hint style="info" %}
For more detailed information about the individual components of the Context Builder, the following pages are planned:

* File System Scanner (Planned)
* Dependency Analyzer (Planned)
* Code Structure Analyzer (Planned)
  {% endhint %}
