The Wrong Starting Question
Most founders ask "what AI features should we add?" The right question is "what is taking our users too long right now?" AI is only valuable when it reduces genuine friction. Adding AI to a problem that doesn't exist produces a feature nobody uses.
After retrofitting AI capabilities into 8 production apps over the past year, here's the decision framework and implementation playbook we use.
Step 1: Identify the Right Feature Category
Not all AI features are equal in implementation complexity or user value.
High value, low complexity (start here)
- Smart defaults: Pre-fill form fields based on user history or context
- Content summarization: Summarize long text (articles, documents, support tickets)
- Smart search / semantic search: Replace keyword search with intent-based search
- Auto-tagging and categorization: Automatically classify user-submitted content
High value, medium complexity
- Text generation: Draft emails, descriptions, reports from structured data
- Image understanding: Extract text, classify images, describe content
- Conversational help: AI assistant within your app context (not general-purpose chat)
High complexity, high risk (only when core to the product)
- Autonomous agents: Multi-step task execution without user confirmation
- Real-time voice AI: Latency constraints are brutal
- Fine-tuned models: Custom training on your proprietary data
Step 2: Choose Your Integration Point Carefully
Adding AI to a production app means deciding where the LLM call sits:
Option A: Server-side, synchronous
User takes action → your server calls LLM → waits for response → returns to user.
Simple, but adds 1–3 seconds of latency to every affected action. Acceptable for drafting a description. Not acceptable for search.
Option B: Server-side, asynchronous
User takes action → your server queues LLM job → user gets a loading state → result delivered via WebSocket or polling.
Better for long-running AI tasks (document analysis, large batch operations). Requires more infrastructure (queue, worker).
Option C: Edge/client-side (for lightweight models)
For classification or embedding tasks, small models running via ONNX.js or TensorFlow.js in the browser can give sub-100ms responses without API costs. Only practical for specific narrow tasks.
Step 3: Abstract the AI Layer from Day One
The biggest technical mistake in AI integration is writing OpenAI-specific code throughout your codebase. Models change fast. Costs change. Alternatives improve. If you're calling `openai.chat.completions.create()` directly in your business logic, you'll regret it.
Instead, create a thin AI service abstraction:
```typescript
// ai-service.ts
export interface AIService {
summarize(text: string, maxWords: number): Promise
classify(text: string, categories: string[]): Promise
generate(prompt: string, context: Record
}
// openai-service.ts - implements AIService
// anthropic-service.ts - implements AIService
// mock-service.ts - implements AIService (for tests)
```
This lets you swap providers without touching business logic, A/B test models, and write meaningful unit tests with the mock implementation.
Step 4: Prompt Engineering as Code
System prompts should be version-controlled, not hardcoded strings. Treat prompts like configuration:
- Store in your database or a config file
- Version them with semantic versioning (prompt v1.3.2)
- A/B test prompt variations with your real user traffic
- Log every prompt input and output for debugging and improvement
A prompt that works perfectly in your playground environment will behave differently with real user data. You need the observability to understand why.
Step 5: Cost Architecture
LLM API costs scale with token count × number of requests. For apps with significant traffic, this can become a meaningful line item fast.
Cost control strategies:
- Cache semantically similar responses: If 40% of your users ask the same question with slightly different wording, caching at the embedding level (vector similarity) can serve those responses without LLM calls.
- Use smaller models for simple tasks: GPT-4o is overkill for classification tasks. GPT-4o mini, Claude Haiku, or Gemini Flash are 10–20x cheaper for simpler tasks with comparable quality.
- Set token limits aggressively: Output tokens are more expensive than input tokens. Constrain your output length to what you actually need.
- Batch where possible: Instead of one API call per item, batch 10–50 items per call for classification and embedding tasks.
Step 6: User Experience for AI Features
AI outputs are probabilistic. They're wrong sometimes. Your UX must account for this:
- Always make AI output editable: Users should be able to correct what the AI generated
- Show confidence or source: "Based on your last 10 orders" or "Suggested based on similar products" builds trust
- Provide a "regenerate" option: One AI output is a draft. Users know to expect variation.
- Don't hide the AI: Transparency about what's AI-generated vs. human-verified improves trust, not diminishes it
Real Example: Adding Semantic Search to a Marketplace App
A marketplace app we built had keyword search that was frustrating users - searching "running shoes for rain" returned nothing because listings used "waterproof trail runners."
Implementation:
1. On listing creation, generate an embedding of the listing title + description using text-embedding-3-small
2. Store the embedding vector in PostgreSQL with pgvector
3. On search query, generate an embedding of the search term
4. Return listings ordered by cosine similarity to the query embedding
5. Blend semantic results with keyword results (70/30) for best coverage
Total implementation time: 3 days. Cost per search: $0.000002. User satisfaction scores on search improved 34% in the following month.
That's AI integration done right: solves a real problem, invisible to the user, costs almost nothing to run.