This document provides a detailed overview of the Context Builder component in DeepLint, which is responsible for gathering and assembling the context used for LLM analysis.
For a high-level explanation of context building and its importance, see the page in the core concepts documentation.
Architecture
The Context Builder is implemented as a set of cooperating classes, each responsible for a specific aspect of context building:
Core Components
ContextBuilder
The ContextBuilder class is the main entry point for context building. It orchestrates the process of gathering information from various sources and assembling it into a structured context for LLM analysis.
buildContext(): The main method for building context. It orchestrates the entire process, from getting Git changes to assembling the final context.
assembleContext(): Assembles the context from the parsed diff, repository structure, dependency graph, and code structure.
buildRepositoryStructure(): Builds the repository structure for the context.
buildChangesContext(): Builds the changes context from the parsed diff.
buildRelatedFilesContext(): Builds the related files context from the parsed diff and dependency graph.
GitOperations
The GitOperations class provides methods for interacting with Git to get information about changes in the repository.
export class GitOperations {
private repositoryRoot: string;
constructor(repositoryRoot: string) {
this.repositoryRoot = repositoryRoot;
}
async hasStagedChanges(): Promise<boolean> {
// Check if there are staged changes
}
async hasUnstagedChanges(): Promise<boolean> {
// Check if there are unstaged changes
}
async getStagedDiff(): Promise<string> {
// Get the diff for staged changes
}
async getUnstagedDiff(): Promise<string> {
// Get the diff for unstaged changes
}
}
DiffParser
The DiffParser class parses Git diffs into a structured format that can be used for context building.
export class DiffParser {
parse(diffText: string): ParsedDiff {
// Parse the diff text into a structured format
}
generateSummary(parsedDiff: ParsedDiff): string {
// Generate a summary of the changes
}
}
FileSystemScanner
The FileSystemScanner class scans the repository to understand its structure, including directories, files, and their metadata.
export class FileSystemScanner {
private repositoryRoot: string;
constructor(repositoryRoot: string) {
this.repositoryRoot = repositoryRoot;
}
async scanRepository(options: ScanOptions): Promise<RepositoryStructure> {
// Scan the repository and return its structure
}
}
DependencyAnalyzer
The DependencyAnalyzer class analyzes dependencies between files to identify relationships.
export class DependencyAnalyzer {
private repositoryRoot: string;
constructor(repositoryRoot: string) {
this.repositoryRoot = repositoryRoot;
}
async buildDependencyGraph(files: FileInfo[]): Promise<DependencyGraph> {
// Build a dependency graph for the files
}
}
CodeStructureAnalyzer
The CodeStructureAnalyzer class extracts information about the structure of code files, such as functions, classes, and interfaces.
export class CodeStructureAnalyzer {
private repositoryRoot: string;
constructor(repositoryRoot: string) {
this.repositoryRoot = repositoryRoot;
}
async analyzeCodeStructure(files: FileInfo[]): Promise<CodeStructure> {
// Analyze the structure of code files
}
}
TokenCounter
The TokenCounter class manages token usage and truncation to ensure the context fits within the LLM's token limits.
export class TokenCounter {
private model: TiktokenModel;
private maxTokens: number;
private usedTokens: number = 0;
private reservedTokens: number = 0;
constructor(
model: string = "gpt-4",
maxTokens: number = 8192,
reservedTokens: number = 1000,
) {
// Initialize token counter
}
countTokens(text: string): number {
// Count the number of tokens in a text
}
addTokens(count: number): void {
// Add tokens to the used tokens count
}
getUsedTokens(): number {
// Get the number of tokens used
}
getAvailableTokens(): number {
// Get the number of tokens available
}
hasEnoughTokens(count: number): boolean {
// Check if there are enough tokens available
}
reset(): void {
// Reset the token counter
}
truncateToFit(text: string, maxTokens: number): string {
// Truncate text to fit within a token limit
}
truncateFileContent(content: string, maxTokens: number): string {
// Truncate a file content to fit within a token limit
}
}
Context Structure
The context built by the Context Builder has the following structure:
The Context Builder includes robust error handling to ensure that context building doesn't fail catastrophically. If an error occurs during context building, the Context Builder will:
Log the error with detailed information
Create an empty context with error information
Return the empty context with error information
This allows the rest of the application to continue functioning even if context building fails.
Implementation Notes
Dependency Injection
The Context Builder uses dependency injection to make testing easier. Each dependency can be injected through the constructor:
This allows for easy mocking of dependencies during testing.
Token Management
Token management is a critical aspect of context building, as LLMs have token limits. The Context Builder uses the TokenCounter class to manage token usage and truncation.
The token counter:
Counts tokens using the tiktoken library
Tracks used and available tokens
Truncates file content when necessary
Prioritizes important parts of files (imports, exports, function signatures)
Project Type Detection
The Context Builder includes project type detection to determine if a project is JavaScript/TypeScript or another type. This is used to decide whether to run dependency analysis, as dependency analysis is only useful for JavaScript/TypeScript projects.
The project type detection is implemented in the detectProjectType utility function:
export enum ProjectType {
JAVASCRIPT = "javascript",
TYPESCRIPT = "typescript",
OTHER = "other",
}
export function detectProjectType(rootDir: string): ProjectType {
// Check for TypeScript configuration
const hasTsConfig = existsSync(join(rootDir, "tsconfig.json"));
if (hasTsConfig) {
return ProjectType.TYPESCRIPT;
}
// Check for JavaScript project indicators
const hasPackageJson = existsSync(join(rootDir, "package.json"));
const hasJsFiles =
existsSync(join(rootDir, "index.js")) ||
existsSync(join(rootDir, "src/index.js")) ||
existsSync(join(rootDir, "lib/index.js"));
const hasNodeModules = existsSync(join(rootDir, "node_modules"));
if (hasPackageJson || hasJsFiles || hasNodeModules) {
return ProjectType.JAVASCRIPT;
}
// Default to other
return ProjectType.OTHER;
}
The buildContext method in the ContextBuilder class uses this function to determine whether to run dependency analysis:
// Check if this is a JavaScript/TypeScript project
const isJsProject = isJavaScriptOrTypeScriptProject(this.options.repositoryRoot);
// Build dependency graph only for JavaScript/TypeScript projects
let dependencyGraph: DependencyGraph;
if (isJsProject && this.options.includeDependencies) {
logger.info("JavaScript/TypeScript project detected. Building dependency graph...");
dependencyGraph = await this.dependencyAnalyzer.buildDependencyGraph(
repoStructure.allFiles,
);
} else {
if (!isJsProject) {
logger.info(
"Non-JavaScript/TypeScript project detected. Skipping dependency analysis.",
);
} else if (!this.options.includeDependencies) {
logger.info("Dependency analysis disabled in configuration. Skipping.");
}
// Create empty dependency graph
dependencyGraph = {
files: new Map(),
getDirectDependencies: () => [],
getDirectDependents: () => [],
getImmediateDependenciesOfChangedFiles: () => [],
};
}
Performance Considerations
Context building can be resource-intensive, especially for large repositories. The Context Builder includes several optimizations to improve performance:
File Filtering: Only relevant files are included in the context
Token Management: File content is truncated to fit within token limits
Smart Prioritization: Files are prioritized based on their importance when token limits are reached
Project Type Detection: Dependency analysis is only run for JavaScript/TypeScript projects
Parallel Processing: Some operations are performed in parallel to improve performance
For more detailed information about the individual components of the Context Builder, the following pages are planned: