📊 Usage Tracking

Purpose: This guide provides step-by-step instructions for implementing AI usage limits, quotas, and cost tracking infrastructure.

Dependencies:

Requires Text Generation and Text Rewriting implementations to be completed
References Text Intelligence Solution Architecture for design context

Related Documents:

Text Intelligence Epic - Capability definition

Note: This guide is specific to Text Intelligence implementation. Once Text Intelligence is fully implemented and validated, this entire text-intelligence/ folder will be archived.

Overview

What This Guide Covers

AI Usage Limits, Quotas & Cost Tracking provides operational controls for:

Tracking AI API usage per user
Enforcing usage limits and quotas
Tracking token counts and estimated costs
Middleware/filter for automatic tracking

What's Included:

Usage tracking entity
Usage tracking service
Middleware/filter for automatic tracking
Quota enforcement

What's NOT Included:

❌ Billing integration (use existing Stripe integration)
❌ Pricing tiers (fork products define these)
❌ Usage analytics dashboard (fork products add this if needed)

Prerequisites

Text Generation and Text Rewriting implementations completed
Database migration system configured
Understanding of Spring filters/interceptors
Familiarity with token counting (if available from LLM)

Token Extraction

Token Availability

Ollama (Local):

May not return token counts in response
Policy: If tokens unavailable, estimate based on character count
Label estimates as "estimated" in tracking data

OpenAI/Cloud Providers:

Usually return token counts
Extract from Response<TokenUsage> objects
Record actual counts

Extraction Pattern

public record TokenUsage(
    int promptTokens,
    int completionTokens,
    boolean estimated  // true if estimated, false if from provider
) {}

private TokenUsage extractTokenUsage(Response<String> response, String prompt, String result) {
    if (response.tokenUsage() != null) {
        return new TokenUsage(
            response.tokenUsage().inputTokenCount(),
            response.tokenUsage().outputTokenCount(),
            false  // Actual counts from provider
        );
    } else {
        // Estimate: roughly 4 characters per token (approximate)
        int estimatedPromptTokens = prompt.length() / 4;
        int estimatedCompletionTokens = result.length() / 4;
        return new TokenUsage(
            estimatedPromptTokens,
            estimatedCompletionTokens,
            true  // Estimated
        );
    }
}

Recording Policy

Record token usage for every AI operation
Label estimates clearly (set estimated flag to true)
Store in database for cost calculation
Use estimates when provider doesn't return tokens (common with local Ollama)

Implementation Steps

Step 1: Create Usage Tracking Entity

File: server/src/main/java/com/saas/springular/common/ai/entity/AIUsageRecord.java

Create entity:

package com.saas.springular.common.ai.entity;

import jakarta.persistence.*;
import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.data.annotation.CreatedDate;
import org.springframework.data.jpa.domain.support.AuditingEntityListener;

import java.time.LocalDateTime;

@Entity
@Table(name = "ai_usage_records")
@EntityListeners(AuditingEntityListener.class)
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class AIUsageRecord {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(nullable = false)
    private Long userId;
    
    @Column(nullable = false)
    private String operation;  // "text_generation", "text_rewrite", etc.
    
    @Column(nullable = false)
    private String model;  // "ollama", "gpt-4", etc.
    
    @Column
    private Integer inputTokens;
    
    @Column
    private Integer outputTokens;
    
    @Column
    private Integer totalTokens;
    
    @Column
    private Double estimatedCost;  // In cents or base currency unit
    
    @CreatedDate
    @Column(nullable = false, updatable = false)
    private LocalDateTime createdAt;
}

Step 2: Create Repository

File: server/src/main/java/com/saas/springular/common/ai/repository/AIUsageRecordRepository.java

Create repository:

package com.saas.springular.common.ai.repository;

import com.saas.springular.common.ai.entity.AIUsageRecord;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import org.springframework.stereotype.Repository;

import java.time.LocalDateTime;

@Repository
public interface AIUsageRecordRepository extends JpaRepository<AIUsageRecord, Long> {
    
    @Query("SELECT COUNT(u) FROM AIUsageRecord u WHERE u.userId = :userId AND u.operation = :operation AND u.createdAt >= :since")
    long countByUserIdAndOperationSince(
        @Param("userId") Long userId,
        @Param("operation") String operation,
        @Param("since") LocalDateTime since
    );
    
    @Query("SELECT SUM(u.totalTokens) FROM AIUsageRecord u WHERE u.userId = :userId AND u.createdAt >= :since")
    Long sumTotalTokensByUserIdSince(
        @Param("userId") Long userId,
        @Param("since") LocalDateTime since
    );
}

Step 3: Create Usage Tracking Service

File: server/src/main/java/com/saas/springular/common/ai/service/AIUsageTrackingService.java

Create service interface:

package com.saas.springular.common.ai.service;

public interface AIUsageTrackingService {
    
    /**
     * Record an AI usage event.
     * 
     * @param userId User ID
     * @param operation Operation type (e.g., "text_generation")
     * @param model Model identifier
     * @param inputTokens Input token count
     * @param outputTokens Output token count
     */
    void recordUsage(Long userId, String operation, String model, Integer inputTokens, Integer outputTokens);
    
    /**
     * Check if user has exceeded quota for an operation.
     * 
     * @param userId User ID
     * @param operation Operation type
     * @param quotaLimit Maximum allowed operations per period
     * @param periodDays Period in days (e.g., 30 for monthly)
     * @return true if quota is exceeded
     */
    boolean isQuotaExceeded(Long userId, String operation, long quotaLimit, int periodDays);
    
    /**
     * Get total tokens used by user in a period.
     * 
     * @param userId User ID
     * @param periodDays Period in days
     * @return Total tokens used
     */
    long getTotalTokensUsed(Long userId, int periodDays);
}

Step 4: Implement Usage Tracking Service

File: server/src/main/java/com/saas/springular/common/ai/service/impl/AIUsageTrackingServiceImpl.java

Create implementation:

package com.saas.springular.common.ai.service.impl;

import com.saas.springular.common.ai.entity.AIUsageRecord;
import com.saas.springular.common.ai.repository.AIUsageRecordRepository;
import com.saas.springular.common.ai.service.AIUsageTrackingService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;

import java.time.LocalDateTime;

@Service
@RequiredArgsConstructor
@Slf4j
public class AIUsageTrackingServiceImpl implements AIUsageTrackingService {
    
    private final AIUsageRecordRepository repository;
    
    // Simple cost estimation (adjust based on your model pricing)
    private static final double COST_PER_1000_TOKENS = 0.01; // Example: $0.01 per 1000 tokens
    
    @Override
    @Transactional
    public void recordUsage(Long userId, String operation, String model, Integer inputTokens, Integer outputTokens) {
        Integer totalTokens = (inputTokens != null ? inputTokens : 0) + (outputTokens != null ? outputTokens : 0);
        Double estimatedCost = totalTokens > 0 ? (totalTokens / 1000.0) * COST_PER_1000_TOKENS : 0.0;
        
        AIUsageRecord record = AIUsageRecord.builder()
                .userId(userId)
                .operation(operation)
                .model(model)
                .inputTokens(inputTokens)
                .outputTokens(outputTokens)
                .totalTokens(totalTokens)
                .estimatedCost(estimatedCost)
                .build();
        
        repository.save(record);
        log.debug("Recorded AI usage: userId={}, operation={}, tokens={}", userId, operation, totalTokens);
    }
    
    @Override
    public boolean isQuotaExceeded(Long userId, String operation, long quotaLimit, int periodDays) {
        LocalDateTime since = LocalDateTime.now().minusDays(periodDays);
        long count = repository.countByUserIdAndOperationSince(userId, operation, since);
        return count >= quotaLimit;
    }
    
    @Override
    public long getTotalTokensUsed(Long userId, int periodDays) {
        LocalDateTime since = LocalDateTime.now().minusDays(periodDays);
        Long total = repository.sumTotalTokensByUserIdSince(userId, since);
        return total != null ? total : 0L;
    }
}

Step 5: Create Database Migration

File: server/src/main/resources/db/migration/V{version}__create_ai_usage_records_table.sql

Create migration (adjust version number):

CREATE TABLE ai_usage_records (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    operation VARCHAR(50) NOT NULL,
    model VARCHAR(50) NOT NULL,
    input_tokens INTEGER,
    output_tokens INTEGER,
    total_tokens INTEGER,
    estimated_cost DECIMAL(10, 4),
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_ai_usage_user_operation_created ON ai_usage_records(user_id, operation, created_at);
CREATE INDEX idx_ai_usage_user_created ON ai_usage_records(user_id, created_at);

Step 6: Integrate with Text Intelligence Service

File: server/src/main/java/com/saas/springular/common/ai/service/impl/TextIntelligenceServiceImpl.java

Add usage tracking (example for generateText):

@Service
@RequiredArgsConstructor
@Slf4j
public class TextIntelligenceServiceImpl implements TextIntelligenceService {
    
    // ... existing fields ...
    private final AIUsageTrackingService usageTrackingService;
    
    // Get current user ID from security context (adjust based on your auth setup)
    private Long getCurrentUserId() {
        // Example: return SecurityContextHolder.getContext().getAuthentication()...
        // Adjust based on your authentication setup
        return 1L; // Placeholder
    }
    
    @Override
    public String generateText(String prompt, String tone) {
        try {
            ChatLanguageModel model = selectModel(tone);
            String result = model.chat(prompt);
            
            // Track usage (token counting is simplified - adjust based on your needs)
            // Note: Ollama may not provide token counts directly
            Integer estimatedTokens = estimateTokens(prompt.length() + result.length());
            usageTrackingService.recordUsage(
                getCurrentUserId(),
                "text_generation",
                "ollama",
                estimateTokens(prompt.length()),
                estimateTokens(result.length())
            );
            
            return result;
        } catch (Exception e) {
            log.error("Text generation failed", e);
            throw new RuntimeException("Failed to generate text: " + e.getMessage(), e);
        }
    }
    
    // Simple token estimation (1 token ≈ 4 characters for English)
    private Integer estimateTokens(int characterCount) {
        return (int) Math.ceil(characterCount / 4.0);
    }
    
    // ... existing methods ...
}

Step 7: Add Quota Check Middleware/Interceptor (Optional)

File: server/src/main/java/com/saas/springular/common/ai/interceptor/AIQuotaInterceptor.java

Create interceptor:

package com.saas.springular.common.ai.interceptor;

import com.saas.springular.common.ai.service.AIUsageTrackingService;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.http.HttpStatus;
import org.springframework.stereotype.Component;
import org.springframework.web.servlet.HandlerInterceptor;

@Component
@RequiredArgsConstructor
public class AIQuotaInterceptor implements HandlerInterceptor {
    
    private final AIUsageTrackingService usageTrackingService;
    
    // Default quota limits (should come from user subscription/plan)
    private static final long DEFAULT_MONTHLY_QUOTA = 1000; // operations per month
    private static final int MONTHLY_PERIOD_DAYS = 30;
    
    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {
        if (request.getRequestURI().startsWith("/api/ai/text")) {
            Long userId = getCurrentUserId(); // Adjust based on your auth setup
            
            String operation = determineOperation(request.getRequestURI());
            if (operation != null && usageTrackingService.isQuotaExceeded(
                userId, operation, DEFAULT_MONTHLY_QUOTA, MONTHLY_PERIOD_DAYS)) {
                response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
                response.setContentType("application/json");
                try {
                    response.getWriter().write("{\"error\":\"Quota exceeded\"}");
                } catch (Exception e) {
                    // Ignore
                }
                return false;
            }
        }
        return true;
    }
    
    private String determineOperation(String uri) {
        if (uri.contains("/generate")) {
            return "text_generation";
        } else if (uri.contains("/rewrite")) {
            return "text_rewrite";
        }
        return null;
    }
    
    private Long getCurrentUserId() {
        // Adjust based on your authentication setup
        return 1L; // Placeholder
    }
}

Register interceptor (in WebMvcConfig):

@Configuration
@RequiredArgsConstructor
public class WebMvcConfig implements WebMvcConfigurer {
    
    private final AIQuotaInterceptor quotaInterceptor;
    
    @Override
    public void addInterceptors(InterceptorRegistry registry) {
        registry.addInterceptor(quotaInterceptor)
                .addPathPatterns("/api/ai/text/**");
    }
}

Metrics Integration

Micrometer Counters and Timers

Counters:

ai.operation.count - Total operations by type
ai.operation.error.count - Errors by category
ai.token.usage - Total tokens used

Timers:

ai.operation.duration - Operation latency by type

Integration Pattern

@Service
@RequiredArgsConstructor
public class TextIntelligenceServiceImpl {
    
    private final MeterRegistry meterRegistry;
    private final TokenUsageTracker tokenUsageTracker;
    
    public String generateText(String prompt, String tone) {
        Timer.Sample sample = Timer.start(meterRegistry);
        String modelName = "ollama-llama2:7b";
        
        try {
            String result = chatModel.chat(prompt);
            sample.stop(Timer.builder("ai.operation.duration")
                .tag("operation", "text-generation")
                .tag("model", modelName)
                .register(meterRegistry));
            
            meterRegistry.counter("ai.operation.count",
                "operation", "text-generation",
                "model", modelName,
                "success", "true").increment();
            
            // Record token usage (if available)
            if (tokenUsageTracker != null) {
                // Extract tokens from response (see Token Extraction section)
                // tokenUsageTracker.recordUsage(...);
            }
            
            return result;
        } catch (Exception e) {
            sample.stop(Timer.builder("ai.operation.duration")
                .tag("operation", "text-generation")
                .tag("model", modelName)
                .register(meterRegistry));
            
            meterRegistry.counter("ai.operation.count",
                "operation", "text-generation",
                "model", modelName,
                "success", "false").increment();
            
            meterRegistry.counter("ai.operation.error.count",
                "operation", "text-generation",
                "errorCategory", categorizeError(e)).increment();
            
            throw new RuntimeException("Text generation failed", e);
        }
    }
}

Dependencies

Add Micrometer dependency to build.gradle:

dependencies {
    implementation 'io.micrometer:micrometer-core'
    implementation 'io.micrometer:micrometer-registry-prometheus'  // Optional: Prometheus
}

Future Considerations

Async Recording: Use queue or async executor to avoid latency regression
Aggregation Patterns: Daily rollups, retention policies (Phase 2+)
Cost Calculation: Token-based cost estimation (requires pricing tables)
Dashboard Integration: Connect metrics to monitoring dashboards (Phase 2+)

Testing

Unit Tests

File: server/src/test/java/com/saas/springular/common/ai/service/impl/AIUsageTrackingServiceImplTest.java

Create test:

package com.saas.springular.common.ai.service.impl;

import com.saas.springular.common.ai.entity.AIUsageRecord;
import com.saas.springular.common.ai.repository.AIUsageRecordRepository;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.InjectMocks;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;

import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;

@ExtendWith(MockitoExtension.class)
class AIUsageTrackingServiceImplTest {
    
    @Mock
    private AIUsageRecordRepository repository;
    
    @InjectMocks
    private AIUsageTrackingServiceImpl service;
    
    @Test
    void recordsUsage() {
        // act
        service.recordUsage(1L, "text_generation", "ollama", 100, 200);
        
        // assert
        verify(repository).save(any(AIUsageRecord.class));
    }
    
    @Test
    void detectsQuotaExceeded() {
        // arrange
        when(repository.countByUserIdAndOperationSince(1L, "text_generation", any()))
                .thenReturn(1001L);
        
        // act
        boolean exceeded = service.isQuotaExceeded(1L, "text_generation", 1000, 30);
        
        // assert
        assertThat(exceeded).isTrue();
    }
}

Time Estimate

Total Time: 3-4 hours

Breakdown:

Entity and repository: 30 minutes
Service implementation: 1 hour
Database migration: 15 minutes
Integration with text services: 30 minutes
Interceptor/middleware: 30 minutes
Testing: 1 hour

Next Steps

After Usage Tracking is complete:

Integration with Billing: Connect usage data to Stripe billing
Quota Configuration: Add user plan/subscription-based quotas
Dashboard: Create usage analytics dashboard (fork products)

Troubleshooting

Issue: Token counting inaccurate

Solution: Implement proper token counting if LLM provides token counts. Otherwise, refine estimation algorithm.

Issue: Performance impact of tracking

Solution: Consider async tracking or batch inserts for high-volume scenarios.

Issue: Quota checks blocking requests

Solution: Verify interceptor configuration and user ID extraction from security context.

Overview​

What This Guide Covers​

Prerequisites​

Token Extraction​

Token Availability​

Extraction Pattern​

Recording Policy​

Implementation Steps​

Step 1: Create Usage Tracking Entity​

Step 2: Create Repository​

Step 3: Create Usage Tracking Service​

Step 4: Implement Usage Tracking Service​

Step 5: Create Database Migration​

Step 6: Integrate with Text Intelligence Service​

Step 7: Add Quota Check Middleware/Interceptor (Optional)​

Metrics Integration​

Micrometer Counters and Timers​

Integration Pattern​

Dependencies​

Future Considerations​

Testing​

Unit Tests​

Time Estimate​

Next Steps​

Troubleshooting​

Issue: Token counting inaccurate​

Issue: Performance impact of tracking​

Issue: Quota checks blocking requests​