Skip to main content

📊 Usage Tracking

Purpose: This guide provides step-by-step instructions for implementing AI usage limits, quotas, and cost tracking infrastructure.

Dependencies:

Related Documents:

Note: This guide is specific to Text Intelligence implementation. Once Text Intelligence is fully implemented and validated, this entire text-intelligence/ folder will be archived.


Overview

What This Guide Covers

AI Usage Limits, Quotas & Cost Tracking provides operational controls for:

  • Tracking AI API usage per user
  • Enforcing usage limits and quotas
  • Tracking token counts and estimated costs
  • Middleware/filter for automatic tracking

What's Included:

  • Usage tracking entity
  • Usage tracking service
  • Middleware/filter for automatic tracking
  • Quota enforcement

What's NOT Included:

  • ❌ Billing integration (use existing Stripe integration)
  • ❌ Pricing tiers (fork products define these)
  • ❌ Usage analytics dashboard (fork products add this if needed)

Prerequisites

  • Text Generation and Text Rewriting implementations completed
  • Database migration system configured
  • Understanding of Spring filters/interceptors
  • Familiarity with token counting (if available from LLM)

Token Extraction

Token Availability

Ollama (Local):

  • May not return token counts in response
  • Policy: If tokens unavailable, estimate based on character count
  • Label estimates as "estimated" in tracking data

OpenAI/Cloud Providers:

  • Usually return token counts
  • Extract from Response<TokenUsage> objects
  • Record actual counts

Extraction Pattern

public record TokenUsage(
int promptTokens,
int completionTokens,
boolean estimated // true if estimated, false if from provider
) {}

private TokenUsage extractTokenUsage(Response<String> response, String prompt, String result) {
if (response.tokenUsage() != null) {
return new TokenUsage(
response.tokenUsage().inputTokenCount(),
response.tokenUsage().outputTokenCount(),
false // Actual counts from provider
);
} else {
// Estimate: roughly 4 characters per token (approximate)
int estimatedPromptTokens = prompt.length() / 4;
int estimatedCompletionTokens = result.length() / 4;
return new TokenUsage(
estimatedPromptTokens,
estimatedCompletionTokens,
true // Estimated
);
}
}

Recording Policy

  • Record token usage for every AI operation
  • Label estimates clearly (set estimated flag to true)
  • Store in database for cost calculation
  • Use estimates when provider doesn't return tokens (common with local Ollama)

Implementation Steps

Step 1: Create Usage Tracking Entity

File: server/src/main/java/com/saas/springular/common/ai/entity/AIUsageRecord.java

Create entity:

package com.saas.springular.common.ai.entity;

import jakarta.persistence.*;
import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.data.annotation.CreatedDate;
import org.springframework.data.jpa.domain.support.AuditingEntityListener;

import java.time.LocalDateTime;

@Entity
@Table(name = "ai_usage_records")
@EntityListeners(AuditingEntityListener.class)
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class AIUsageRecord {

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;

@Column(nullable = false)
private Long userId;

@Column(nullable = false)
private String operation; // "text_generation", "text_rewrite", etc.

@Column(nullable = false)
private String model; // "ollama", "gpt-4", etc.

@Column
private Integer inputTokens;

@Column
private Integer outputTokens;

@Column
private Integer totalTokens;

@Column
private Double estimatedCost; // In cents or base currency unit

@CreatedDate
@Column(nullable = false, updatable = false)
private LocalDateTime createdAt;
}

Step 2: Create Repository

File: server/src/main/java/com/saas/springular/common/ai/repository/AIUsageRecordRepository.java

Create repository:

package com.saas.springular.common.ai.repository;

import com.saas.springular.common.ai.entity.AIUsageRecord;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import org.springframework.stereotype.Repository;

import java.time.LocalDateTime;

@Repository
public interface AIUsageRecordRepository extends JpaRepository<AIUsageRecord, Long> {

@Query("SELECT COUNT(u) FROM AIUsageRecord u WHERE u.userId = :userId AND u.operation = :operation AND u.createdAt >= :since")
long countByUserIdAndOperationSince(
@Param("userId") Long userId,
@Param("operation") String operation,
@Param("since") LocalDateTime since
);

@Query("SELECT SUM(u.totalTokens) FROM AIUsageRecord u WHERE u.userId = :userId AND u.createdAt >= :since")
Long sumTotalTokensByUserIdSince(
@Param("userId") Long userId,
@Param("since") LocalDateTime since
);
}

Step 3: Create Usage Tracking Service

File: server/src/main/java/com/saas/springular/common/ai/service/AIUsageTrackingService.java

Create service interface:

package com.saas.springular.common.ai.service;

public interface AIUsageTrackingService {

/**
* Record an AI usage event.
*
* @param userId User ID
* @param operation Operation type (e.g., "text_generation")
* @param model Model identifier
* @param inputTokens Input token count
* @param outputTokens Output token count
*/
void recordUsage(Long userId, String operation, String model, Integer inputTokens, Integer outputTokens);

/**
* Check if user has exceeded quota for an operation.
*
* @param userId User ID
* @param operation Operation type
* @param quotaLimit Maximum allowed operations per period
* @param periodDays Period in days (e.g., 30 for monthly)
* @return true if quota is exceeded
*/
boolean isQuotaExceeded(Long userId, String operation, long quotaLimit, int periodDays);

/**
* Get total tokens used by user in a period.
*
* @param userId User ID
* @param periodDays Period in days
* @return Total tokens used
*/
long getTotalTokensUsed(Long userId, int periodDays);
}

Step 4: Implement Usage Tracking Service

File: server/src/main/java/com/saas/springular/common/ai/service/impl/AIUsageTrackingServiceImpl.java

Create implementation:

package com.saas.springular.common.ai.service.impl;

import com.saas.springular.common.ai.entity.AIUsageRecord;
import com.saas.springular.common.ai.repository.AIUsageRecordRepository;
import com.saas.springular.common.ai.service.AIUsageTrackingService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;

import java.time.LocalDateTime;

@Service
@RequiredArgsConstructor
@Slf4j
public class AIUsageTrackingServiceImpl implements AIUsageTrackingService {

private final AIUsageRecordRepository repository;

// Simple cost estimation (adjust based on your model pricing)
private static final double COST_PER_1000_TOKENS = 0.01; // Example: $0.01 per 1000 tokens

@Override
@Transactional
public void recordUsage(Long userId, String operation, String model, Integer inputTokens, Integer outputTokens) {
Integer totalTokens = (inputTokens != null ? inputTokens : 0) + (outputTokens != null ? outputTokens : 0);
Double estimatedCost = totalTokens > 0 ? (totalTokens / 1000.0) * COST_PER_1000_TOKENS : 0.0;

AIUsageRecord record = AIUsageRecord.builder()
.userId(userId)
.operation(operation)
.model(model)
.inputTokens(inputTokens)
.outputTokens(outputTokens)
.totalTokens(totalTokens)
.estimatedCost(estimatedCost)
.build();

repository.save(record);
log.debug("Recorded AI usage: userId={}, operation={}, tokens={}", userId, operation, totalTokens);
}

@Override
public boolean isQuotaExceeded(Long userId, String operation, long quotaLimit, int periodDays) {
LocalDateTime since = LocalDateTime.now().minusDays(periodDays);
long count = repository.countByUserIdAndOperationSince(userId, operation, since);
return count >= quotaLimit;
}

@Override
public long getTotalTokensUsed(Long userId, int periodDays) {
LocalDateTime since = LocalDateTime.now().minusDays(periodDays);
Long total = repository.sumTotalTokensByUserIdSince(userId, since);
return total != null ? total : 0L;
}
}

Step 5: Create Database Migration

File: server/src/main/resources/db/migration/V{version}__create_ai_usage_records_table.sql

Create migration (adjust version number):

CREATE TABLE ai_usage_records (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
operation VARCHAR(50) NOT NULL,
model VARCHAR(50) NOT NULL,
input_tokens INTEGER,
output_tokens INTEGER,
total_tokens INTEGER,
estimated_cost DECIMAL(10, 4),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_ai_usage_user_operation_created ON ai_usage_records(user_id, operation, created_at);
CREATE INDEX idx_ai_usage_user_created ON ai_usage_records(user_id, created_at);

Step 6: Integrate with Text Intelligence Service

File: server/src/main/java/com/saas/springular/common/ai/service/impl/TextIntelligenceServiceImpl.java

Add usage tracking (example for generateText):

@Service
@RequiredArgsConstructor
@Slf4j
public class TextIntelligenceServiceImpl implements TextIntelligenceService {

// ... existing fields ...
private final AIUsageTrackingService usageTrackingService;

// Get current user ID from security context (adjust based on your auth setup)
private Long getCurrentUserId() {
// Example: return SecurityContextHolder.getContext().getAuthentication()...
// Adjust based on your authentication setup
return 1L; // Placeholder
}

@Override
public String generateText(String prompt, String tone) {
try {
ChatLanguageModel model = selectModel(tone);
String result = model.chat(prompt);

// Track usage (token counting is simplified - adjust based on your needs)
// Note: Ollama may not provide token counts directly
Integer estimatedTokens = estimateTokens(prompt.length() + result.length());
usageTrackingService.recordUsage(
getCurrentUserId(),
"text_generation",
"ollama",
estimateTokens(prompt.length()),
estimateTokens(result.length())
);

return result;
} catch (Exception e) {
log.error("Text generation failed", e);
throw new RuntimeException("Failed to generate text: " + e.getMessage(), e);
}
}

// Simple token estimation (1 token ≈ 4 characters for English)
private Integer estimateTokens(int characterCount) {
return (int) Math.ceil(characterCount / 4.0);
}

// ... existing methods ...
}

Step 7: Add Quota Check Middleware/Interceptor (Optional)

File: server/src/main/java/com/saas/springular/common/ai/interceptor/AIQuotaInterceptor.java

Create interceptor:

package com.saas.springular.common.ai.interceptor;

import com.saas.springular.common.ai.service.AIUsageTrackingService;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.http.HttpStatus;
import org.springframework.stereotype.Component;
import org.springframework.web.servlet.HandlerInterceptor;

@Component
@RequiredArgsConstructor
public class AIQuotaInterceptor implements HandlerInterceptor {

private final AIUsageTrackingService usageTrackingService;

// Default quota limits (should come from user subscription/plan)
private static final long DEFAULT_MONTHLY_QUOTA = 1000; // operations per month
private static final int MONTHLY_PERIOD_DAYS = 30;

@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {
if (request.getRequestURI().startsWith("/api/ai/text")) {
Long userId = getCurrentUserId(); // Adjust based on your auth setup

String operation = determineOperation(request.getRequestURI());
if (operation != null && usageTrackingService.isQuotaExceeded(
userId, operation, DEFAULT_MONTHLY_QUOTA, MONTHLY_PERIOD_DAYS)) {
response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
response.setContentType("application/json");
try {
response.getWriter().write("{\"error\":\"Quota exceeded\"}");
} catch (Exception e) {
// Ignore
}
return false;
}
}
return true;
}

private String determineOperation(String uri) {
if (uri.contains("/generate")) {
return "text_generation";
} else if (uri.contains("/rewrite")) {
return "text_rewrite";
}
return null;
}

private Long getCurrentUserId() {
// Adjust based on your authentication setup
return 1L; // Placeholder
}
}

Register interceptor (in WebMvcConfig):

@Configuration
@RequiredArgsConstructor
public class WebMvcConfig implements WebMvcConfigurer {

private final AIQuotaInterceptor quotaInterceptor;

@Override
public void addInterceptors(InterceptorRegistry registry) {
registry.addInterceptor(quotaInterceptor)
.addPathPatterns("/api/ai/text/**");
}
}

Metrics Integration

Micrometer Counters and Timers

Counters:

  • ai.operation.count - Total operations by type
  • ai.operation.error.count - Errors by category
  • ai.token.usage - Total tokens used

Timers:

  • ai.operation.duration - Operation latency by type

Integration Pattern

@Service
@RequiredArgsConstructor
public class TextIntelligenceServiceImpl {

private final MeterRegistry meterRegistry;
private final TokenUsageTracker tokenUsageTracker;

public String generateText(String prompt, String tone) {
Timer.Sample sample = Timer.start(meterRegistry);
String modelName = "ollama-llama2:7b";

try {
String result = chatModel.chat(prompt);
sample.stop(Timer.builder("ai.operation.duration")
.tag("operation", "text-generation")
.tag("model", modelName)
.register(meterRegistry));

meterRegistry.counter("ai.operation.count",
"operation", "text-generation",
"model", modelName,
"success", "true").increment();

// Record token usage (if available)
if (tokenUsageTracker != null) {
// Extract tokens from response (see Token Extraction section)
// tokenUsageTracker.recordUsage(...);
}

return result;
} catch (Exception e) {
sample.stop(Timer.builder("ai.operation.duration")
.tag("operation", "text-generation")
.tag("model", modelName)
.register(meterRegistry));

meterRegistry.counter("ai.operation.count",
"operation", "text-generation",
"model", modelName,
"success", "false").increment();

meterRegistry.counter("ai.operation.error.count",
"operation", "text-generation",
"errorCategory", categorizeError(e)).increment();

throw new RuntimeException("Text generation failed", e);
}
}
}

Dependencies

Add Micrometer dependency to build.gradle:

dependencies {
implementation 'io.micrometer:micrometer-core'
implementation 'io.micrometer:micrometer-registry-prometheus' // Optional: Prometheus
}

Future Considerations

  • Async Recording: Use queue or async executor to avoid latency regression
  • Aggregation Patterns: Daily rollups, retention policies (Phase 2+)
  • Cost Calculation: Token-based cost estimation (requires pricing tables)
  • Dashboard Integration: Connect metrics to monitoring dashboards (Phase 2+)

Testing

Unit Tests

File: server/src/test/java/com/saas/springular/common/ai/service/impl/AIUsageTrackingServiceImplTest.java

Create test:

package com.saas.springular.common.ai.service.impl;

import com.saas.springular.common.ai.entity.AIUsageRecord;
import com.saas.springular.common.ai.repository.AIUsageRecordRepository;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.InjectMocks;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;

import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;

@ExtendWith(MockitoExtension.class)
class AIUsageTrackingServiceImplTest {

@Mock
private AIUsageRecordRepository repository;

@InjectMocks
private AIUsageTrackingServiceImpl service;

@Test
void recordsUsage() {
// act
service.recordUsage(1L, "text_generation", "ollama", 100, 200);

// assert
verify(repository).save(any(AIUsageRecord.class));
}

@Test
void detectsQuotaExceeded() {
// arrange
when(repository.countByUserIdAndOperationSince(1L, "text_generation", any()))
.thenReturn(1001L);

// act
boolean exceeded = service.isQuotaExceeded(1L, "text_generation", 1000, 30);

// assert
assertThat(exceeded).isTrue();
}
}

Time Estimate

Total Time: 3-4 hours

Breakdown:

  • Entity and repository: 30 minutes
  • Service implementation: 1 hour
  • Database migration: 15 minutes
  • Integration with text services: 30 minutes
  • Interceptor/middleware: 30 minutes
  • Testing: 1 hour

Next Steps

After Usage Tracking is complete:

  1. Integration with Billing: Connect usage data to Stripe billing
  2. Quota Configuration: Add user plan/subscription-based quotas
  3. Dashboard: Create usage analytics dashboard (fork products)

Troubleshooting

Issue: Token counting inaccurate

Solution: Implement proper token counting if LLM provides token counts. Otherwise, refine estimation algorithm.

Issue: Performance impact of tracking

Solution: Consider async tracking or batch inserts for high-volume scenarios.

Issue: Quota checks blocking requests

Solution: Verify interceptor configuration and user ID extraction from security context.