📊 Usage Tracking
Purpose: This guide provides step-by-step instructions for implementing AI usage limits, quotas, and cost tracking infrastructure.
Dependencies:
- Requires Text Generation and Text Rewriting implementations to be completed
- References Text Intelligence Solution Architecture for design context
Related Documents:
- Text Intelligence Epic - Capability definition
Note: This guide is specific to Text Intelligence implementation. Once Text Intelligence is fully implemented and validated, this entire text-intelligence/ folder will be archived.
Overview
What This Guide Covers
AI Usage Limits, Quotas & Cost Tracking provides operational controls for:
- Tracking AI API usage per user
- Enforcing usage limits and quotas
- Tracking token counts and estimated costs
- Middleware/filter for automatic tracking
What's Included:
- Usage tracking entity
- Usage tracking service
- Middleware/filter for automatic tracking
- Quota enforcement
What's NOT Included:
- ❌ Billing integration (use existing Stripe integration)
- ❌ Pricing tiers (fork products define these)
- ❌ Usage analytics dashboard (fork products add this if needed)
Prerequisites
- Text Generation and Text Rewriting implementations completed
- Database migration system configured
- Understanding of Spring filters/interceptors
- Familiarity with token counting (if available from LLM)
Token Extraction
Token Availability
Ollama (Local):
- May not return token counts in response
- Policy: If tokens unavailable, estimate based on character count
- Label estimates as "estimated" in tracking data
OpenAI/Cloud Providers:
- Usually return token counts
- Extract from
Response<TokenUsage>objects - Record actual counts
Extraction Pattern
public record TokenUsage(
int promptTokens,
int completionTokens,
boolean estimated // true if estimated, false if from provider
) {}
private TokenUsage extractTokenUsage(Response<String> response, String prompt, String result) {
if (response.tokenUsage() != null) {
return new TokenUsage(
response.tokenUsage().inputTokenCount(),
response.tokenUsage().outputTokenCount(),
false // Actual counts from provider
);
} else {
// Estimate: roughly 4 characters per token (approximate)
int estimatedPromptTokens = prompt.length() / 4;
int estimatedCompletionTokens = result.length() / 4;
return new TokenUsage(
estimatedPromptTokens,
estimatedCompletionTokens,
true // Estimated
);
}
}
Recording Policy
- Record token usage for every AI operation
- Label estimates clearly (set
estimatedflag totrue) - Store in database for cost calculation
- Use estimates when provider doesn't return tokens (common with local Ollama)
Implementation Steps
Step 1: Create Usage Tracking Entity
File: server/src/main/java/com/saas/springular/common/ai/entity/AIUsageRecord.java
Create entity:
package com.saas.springular.common.ai.entity;
import jakarta.persistence.*;
import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.data.annotation.CreatedDate;
import org.springframework.data.jpa.domain.support.AuditingEntityListener;
import java.time.LocalDateTime;
@Entity
@Table(name = "ai_usage_records")
@EntityListeners(AuditingEntityListener.class)
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class AIUsageRecord {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false)
private Long userId;
@Column(nullable = false)
private String operation; // "text_generation", "text_rewrite", etc.
@Column(nullable = false)
private String model; // "ollama", "gpt-4", etc.
@Column
private Integer inputTokens;
@Column
private Integer outputTokens;
@Column
private Integer totalTokens;
@Column
private Double estimatedCost; // In cents or base currency unit
@CreatedDate
@Column(nullable = false, updatable = false)
private LocalDateTime createdAt;
}
Step 2: Create Repository
File: server/src/main/java/com/saas/springular/common/ai/repository/AIUsageRecordRepository.java
Create repository:
package com.saas.springular.common.ai.repository;
import com.saas.springular.common.ai.entity.AIUsageRecord;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import org.springframework.stereotype.Repository;
import java.time.LocalDateTime;
@Repository
public interface AIUsageRecordRepository extends JpaRepository<AIUsageRecord, Long> {
@Query("SELECT COUNT(u) FROM AIUsageRecord u WHERE u.userId = :userId AND u.operation = :operation AND u.createdAt >= :since")
long countByUserIdAndOperationSince(
@Param("userId") Long userId,
@Param("operation") String operation,
@Param("since") LocalDateTime since
);
@Query("SELECT SUM(u.totalTokens) FROM AIUsageRecord u WHERE u.userId = :userId AND u.createdAt >= :since")
Long sumTotalTokensByUserIdSince(
@Param("userId") Long userId,
@Param("since") LocalDateTime since
);
}
Step 3: Create Usage Tracking Service
File: server/src/main/java/com/saas/springular/common/ai/service/AIUsageTrackingService.java
Create service interface:
package com.saas.springular.common.ai.service;
public interface AIUsageTrackingService {
/**
* Record an AI usage event.
*
* @param userId User ID
* @param operation Operation type (e.g., "text_generation")
* @param model Model identifier
* @param inputTokens Input token count
* @param outputTokens Output token count
*/
void recordUsage(Long userId, String operation, String model, Integer inputTokens, Integer outputTokens);
/**
* Check if user has exceeded quota for an operation.
*
* @param userId User ID
* @param operation Operation type
* @param quotaLimit Maximum allowed operations per period
* @param periodDays Period in days (e.g., 30 for monthly)
* @return true if quota is exceeded
*/
boolean isQuotaExceeded(Long userId, String operation, long quotaLimit, int periodDays);
/**
* Get total tokens used by user in a period.
*
* @param userId User ID
* @param periodDays Period in days
* @return Total tokens used
*/
long getTotalTokensUsed(Long userId, int periodDays);
}
Step 4: Implement Usage Tracking Service
File: server/src/main/java/com/saas/springular/common/ai/service/impl/AIUsageTrackingServiceImpl.java
Create implementation:
package com.saas.springular.common.ai.service.impl;
import com.saas.springular.common.ai.entity.AIUsageRecord;
import com.saas.springular.common.ai.repository.AIUsageRecordRepository;
import com.saas.springular.common.ai.service.AIUsageTrackingService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import java.time.LocalDateTime;
@Service
@RequiredArgsConstructor
@Slf4j
public class AIUsageTrackingServiceImpl implements AIUsageTrackingService {
private final AIUsageRecordRepository repository;
// Simple cost estimation (adjust based on your model pricing)
private static final double COST_PER_1000_TOKENS = 0.01; // Example: $0.01 per 1000 tokens
@Override
@Transactional
public void recordUsage(Long userId, String operation, String model, Integer inputTokens, Integer outputTokens) {
Integer totalTokens = (inputTokens != null ? inputTokens : 0) + (outputTokens != null ? outputTokens : 0);
Double estimatedCost = totalTokens > 0 ? (totalTokens / 1000.0) * COST_PER_1000_TOKENS : 0.0;
AIUsageRecord record = AIUsageRecord.builder()
.userId(userId)
.operation(operation)
.model(model)
.inputTokens(inputTokens)
.outputTokens(outputTokens)
.totalTokens(totalTokens)
.estimatedCost(estimatedCost)
.build();
repository.save(record);
log.debug("Recorded AI usage: userId={}, operation={}, tokens={}", userId, operation, totalTokens);
}
@Override
public boolean isQuotaExceeded(Long userId, String operation, long quotaLimit, int periodDays) {
LocalDateTime since = LocalDateTime.now().minusDays(periodDays);
long count = repository.countByUserIdAndOperationSince(userId, operation, since);
return count >= quotaLimit;
}
@Override
public long getTotalTokensUsed(Long userId, int periodDays) {
LocalDateTime since = LocalDateTime.now().minusDays(periodDays);
Long total = repository.sumTotalTokensByUserIdSince(userId, since);
return total != null ? total : 0L;
}
}
Step 5: Create Database Migration
File: server/src/main/resources/db/migration/V{version}__create_ai_usage_records_table.sql
Create migration (adjust version number):
CREATE TABLE ai_usage_records (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
operation VARCHAR(50) NOT NULL,
model VARCHAR(50) NOT NULL,
input_tokens INTEGER,
output_tokens INTEGER,
total_tokens INTEGER,
estimated_cost DECIMAL(10, 4),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_ai_usage_user_operation_created ON ai_usage_records(user_id, operation, created_at);
CREATE INDEX idx_ai_usage_user_created ON ai_usage_records(user_id, created_at);
Step 6: Integrate with Text Intelligence Service
File: server/src/main/java/com/saas/springular/common/ai/service/impl/TextIntelligenceServiceImpl.java
Add usage tracking (example for generateText):
@Service
@RequiredArgsConstructor
@Slf4j
public class TextIntelligenceServiceImpl implements TextIntelligenceService {
// ... existing fields ...
private final AIUsageTrackingService usageTrackingService;
// Get current user ID from security context (adjust based on your auth setup)
private Long getCurrentUserId() {
// Example: return SecurityContextHolder.getContext().getAuthentication()...
// Adjust based on your authentication setup
return 1L; // Placeholder
}
@Override
public String generateText(String prompt, String tone) {
try {
ChatLanguageModel model = selectModel(tone);
String result = model.chat(prompt);
// Track usage (token counting is simplified - adjust based on your needs)
// Note: Ollama may not provide token counts directly
Integer estimatedTokens = estimateTokens(prompt.length() + result.length());
usageTrackingService.recordUsage(
getCurrentUserId(),
"text_generation",
"ollama",
estimateTokens(prompt.length()),
estimateTokens(result.length())
);
return result;
} catch (Exception e) {
log.error("Text generation failed", e);
throw new RuntimeException("Failed to generate text: " + e.getMessage(), e);
}
}
// Simple token estimation (1 token ≈ 4 characters for English)
private Integer estimateTokens(int characterCount) {
return (int) Math.ceil(characterCount / 4.0);
}
// ... existing methods ...
}
Step 7: Add Quota Check Middleware/Interceptor (Optional)
File: server/src/main/java/com/saas/springular/common/ai/interceptor/AIQuotaInterceptor.java
Create interceptor:
package com.saas.springular.common.ai.interceptor;
import com.saas.springular.common.ai.service.AIUsageTrackingService;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.http.HttpStatus;
import org.springframework.stereotype.Component;
import org.springframework.web.servlet.HandlerInterceptor;
@Component
@RequiredArgsConstructor
public class AIQuotaInterceptor implements HandlerInterceptor {
private final AIUsageTrackingService usageTrackingService;
// Default quota limits (should come from user subscription/plan)
private static final long DEFAULT_MONTHLY_QUOTA = 1000; // operations per month
private static final int MONTHLY_PERIOD_DAYS = 30;
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {
if (request.getRequestURI().startsWith("/api/ai/text")) {
Long userId = getCurrentUserId(); // Adjust based on your auth setup
String operation = determineOperation(request.getRequestURI());
if (operation != null && usageTrackingService.isQuotaExceeded(
userId, operation, DEFAULT_MONTHLY_QUOTA, MONTHLY_PERIOD_DAYS)) {
response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
response.setContentType("application/json");
try {
response.getWriter().write("{\"error\":\"Quota exceeded\"}");
} catch (Exception e) {
// Ignore
}
return false;
}
}
return true;
}
private String determineOperation(String uri) {
if (uri.contains("/generate")) {
return "text_generation";
} else if (uri.contains("/rewrite")) {
return "text_rewrite";
}
return null;
}
private Long getCurrentUserId() {
// Adjust based on your authentication setup
return 1L; // Placeholder
}
}
Register interceptor (in WebMvcConfig):
@Configuration
@RequiredArgsConstructor
public class WebMvcConfig implements WebMvcConfigurer {
private final AIQuotaInterceptor quotaInterceptor;
@Override
public void addInterceptors(InterceptorRegistry registry) {
registry.addInterceptor(quotaInterceptor)
.addPathPatterns("/api/ai/text/**");
}
}
Metrics Integration
Micrometer Counters and Timers
Counters:
ai.operation.count- Total operations by typeai.operation.error.count- Errors by categoryai.token.usage- Total tokens used
Timers:
ai.operation.duration- Operation latency by type
Integration Pattern
@Service
@RequiredArgsConstructor
public class TextIntelligenceServiceImpl {
private final MeterRegistry meterRegistry;
private final TokenUsageTracker tokenUsageTracker;
public String generateText(String prompt, String tone) {
Timer.Sample sample = Timer.start(meterRegistry);
String modelName = "ollama-llama2:7b";
try {
String result = chatModel.chat(prompt);
sample.stop(Timer.builder("ai.operation.duration")
.tag("operation", "text-generation")
.tag("model", modelName)
.register(meterRegistry));
meterRegistry.counter("ai.operation.count",
"operation", "text-generation",
"model", modelName,
"success", "true").increment();
// Record token usage (if available)
if (tokenUsageTracker != null) {
// Extract tokens from response (see Token Extraction section)
// tokenUsageTracker.recordUsage(...);
}
return result;
} catch (Exception e) {
sample.stop(Timer.builder("ai.operation.duration")
.tag("operation", "text-generation")
.tag("model", modelName)
.register(meterRegistry));
meterRegistry.counter("ai.operation.count",
"operation", "text-generation",
"model", modelName,
"success", "false").increment();
meterRegistry.counter("ai.operation.error.count",
"operation", "text-generation",
"errorCategory", categorizeError(e)).increment();
throw new RuntimeException("Text generation failed", e);
}
}
}
Dependencies
Add Micrometer dependency to build.gradle:
dependencies {
implementation 'io.micrometer:micrometer-core'
implementation 'io.micrometer:micrometer-registry-prometheus' // Optional: Prometheus
}
Future Considerations
- Async Recording: Use queue or async executor to avoid latency regression
- Aggregation Patterns: Daily rollups, retention policies (Phase 2+)
- Cost Calculation: Token-based cost estimation (requires pricing tables)
- Dashboard Integration: Connect metrics to monitoring dashboards (Phase 2+)
Testing
Unit Tests
File: server/src/test/java/com/saas/springular/common/ai/service/impl/AIUsageTrackingServiceImplTest.java
Create test:
package com.saas.springular.common.ai.service.impl;
import com.saas.springular.common.ai.entity.AIUsageRecord;
import com.saas.springular.common.ai.repository.AIUsageRecordRepository;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.InjectMocks;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
@ExtendWith(MockitoExtension.class)
class AIUsageTrackingServiceImplTest {
@Mock
private AIUsageRecordRepository repository;
@InjectMocks
private AIUsageTrackingServiceImpl service;
@Test
void recordsUsage() {
// act
service.recordUsage(1L, "text_generation", "ollama", 100, 200);
// assert
verify(repository).save(any(AIUsageRecord.class));
}
@Test
void detectsQuotaExceeded() {
// arrange
when(repository.countByUserIdAndOperationSince(1L, "text_generation", any()))
.thenReturn(1001L);
// act
boolean exceeded = service.isQuotaExceeded(1L, "text_generation", 1000, 30);
// assert
assertThat(exceeded).isTrue();
}
}
Time Estimate
Total Time: 3-4 hours
Breakdown:
- Entity and repository: 30 minutes
- Service implementation: 1 hour
- Database migration: 15 minutes
- Integration with text services: 30 minutes
- Interceptor/middleware: 30 minutes
- Testing: 1 hour
Next Steps
After Usage Tracking is complete:
- Integration with Billing: Connect usage data to Stripe billing
- Quota Configuration: Add user plan/subscription-based quotas
- Dashboard: Create usage analytics dashboard (fork products)
Troubleshooting
Issue: Token counting inaccurate
Solution: Implement proper token counting if LLM provides token counts. Otherwise, refine estimation algorithm.
Issue: Performance impact of tracking
Solution: Consider async tracking or batch inserts for high-volume scenarios.
Issue: Quota checks blocking requests
Solution: Verify interceptor configuration and user ID extraction from security context.