Generative AI Systems Stack
LLMs / Foundation Models
- OpenAI — GPT-5, o3, o4-mini
- Anthropic — Claude Opus 4.6, Sonnet 4.5
- Google Gemini — Gemini 3, Gemini 2.5 Pro
- Meta Llama — Llama 4 (Maverick, Scout)
- DeepSeek — DeepSeek R1
AI Frameworks
- LangChain — LLM orchestration and chaining
- LlamaIndex — Data-centric RAG and document processing
- Haystack — Production RAG and search pipelines
- Semantic Kernel — Microsoft’s enterprise AI SDK
AI Coding Assistants
- GitHub Copilot — Inline code completion
- Cursor — AI-native code editor
- Claude Code — CLI-based coding assistant
Text Embeddings
- OpenAI Embeddings — text-embedding-3-large
- Cohere Embed — Multilingual and multimodal
- Mistral Embed — Retrieval-optimized
- Voyage AI — Domain-specific embeddings
Vector Databases
- Pinecone — Fully managed, serverless
- Weaviate — Open-source with hybrid search
- Qdrant — High-performance, Rust-based
- Chroma — Lightweight, developer-friendly
- Milvus — Billion-scale vector search
- pgvector — PostgreSQL extension
RAG (Retrieval-Augmented Generation)
- Agentic RAG — LLM-driven query decomposition
- HiFi-RAG — Multi-stage hierarchical filtering
- Bidirectional RAG — Controlled write-back with grounding checks
- LlamaIndex Workflows — Event-driven RAG pipelines
- LangChain LCEL — Chain-based RAG orchestration
AI Agents
- LangGraph — Stateful multi-actor workflows
- CrewAI — Role-based multi-agent collaboration
- AutoGen — Microsoft’s multi-agent conversation framework
- AutoGPT — Autonomous long-running agents
LLM Serving / Inference
- vLLM — PagedAttention, high-throughput serving
- TGI — Hugging Face’s production inference server
- Ollama — Local model serving
- TensorRT-LLM — NVIDIA-optimized inference
- SGLang — Structured generation for constrained outputs
Fine-Tuning
- Axolotl — Multi-GPU, supports LoRA/QLoRA/SFT/RLHF
- Unsloth — 2-5x faster fine-tuning, 80% less memory
- Torchtune — Deep customization and scalability
- PEFT — Hugging Face parameter-efficient fine-tuning
LLM Monitoring / Observability
- LangSmith — Tracing, cost, and latency tracking
- Weights & Biases Weave — Multi-agent execution traces
- Arize Phoenix — Open-source drift and bias monitoring
- Langfuse — Open-source LLM observability
- AgentOps — Agentic workflow monitoring
Evaluation and Testing
- RAGAS — RAG-specific evaluation (faithfulness, relevancy, recall)
- DeepEval — 60+ metrics, Pytest-like LLM testing
- PromptFoo — Prompt A/B testing via YAML/CLI
- OpenAI Evals — Open-source evaluation framework
