Generative AI Systems Stack
LLMs / Foundation Models
- OpenAI — GPT-5.4, GPT-5.3, o3-pro
- Anthropic — Claude Opus 4.6, Sonnet 4.6
- Google Gemini — Gemini 3.1 Pro, Gemini 3.1 Flash
- Meta Llama — Llama 4 (Maverick, Scout)
- DeepSeek — DeepSeek R1, DeepSeek-V3
- Mistral — Mistral Large 3, Mistral Small 4
- xAI — Grok 4.20
- Qwen — Qwen 3.6-Plus
AI Frameworks
- LangChain — LLM orchestration and chaining
- LlamaIndex — Data-centric RAG and document processing
- Haystack — Production RAG and search pipelines
- Microsoft Agent Framework — Unified SDK (successor to Semantic Kernel + AutoGen)
- Mastra — TypeScript-first AI agent framework
AI Coding Assistants
- Cursor — AI-native code editor with Background Agents
- GitHub Copilot — Inline code completion and chat
- Claude Code — CLI-based agentic coding assistant
- Windsurf — AI IDE with agentic flows
- OpenAI Codex — Cloud-based agentic coding with parallel worktrees
- Devin — Autonomous AI software engineer
- Amazon Kiro — Spec-driven AI IDE
Text Embeddings
- OpenAI Embeddings — text-embedding-3-large
- Cohere Embed 4 — Multimodal (text + images)
- Voyage AI — Voyage 4, MoE architecture (acquired by MongoDB)
- Gemini Embedding — Multimodal including native audio
- Mistral Embed — Retrieval-optimized, lowest cost
Vector Databases
- Pinecone — Fully managed, serverless
- Weaviate — Open-source with hybrid search
- Qdrant — High-performance, Rust-based
- Chroma — Lightweight, developer-friendly
- Milvus — Billion-scale vector search, hot/cold tiering
- pgvector — PostgreSQL extension
RAG (Retrieval-Augmented Generation)
- Hybrid RAG — Dense vector + sparse keyword search (production baseline)
- Agentic RAG — Autonomous plan-retrieve-reason loops
- Graph RAG — Knowledge graph layer for multi-hop reasoning
- LlamaIndex Workflows — Event-driven RAG pipelines
- LangChain LCEL — Chain-based RAG orchestration
AI Agents
- LangGraph — Stateful multi-actor workflows
- CrewAI — Role-based multi-agent collaboration
- OpenAI Agents SDK — Production-grade agent orchestration
- Claude Agent SDK — Tool-use agents with constitutional safety
- Google ADK — Agent Development Kit (Python, Java, Go, TS)
- AG2 — Community fork of AutoGen, multi-agent conversations
- Smolagents — Hugging Face’s lightweight agent framework
Model Context Protocol (MCP)
- MCP Specification — Open standard for connecting AI to tools and data
- MCP Servers — 10,000+ community servers
- Agentic AI Foundation — Linux Foundation governance (Anthropic, OpenAI, Google, Microsoft)
LLM Serving / Inference
- vLLM — PagedAttention, high-throughput serving
- SGLang — Zero-overhead batch scheduler, fastest on benchmarks
- Ollama — Local model serving with Apple MLX support
- TensorRT-LLM — NVIDIA-optimized inference
- TGI — Hugging Face’s production inference server
- LiteLLM — AI gateway/proxy for 100+ LLMs in OpenAI format
Fine-Tuning
- Unsloth — 2-12x faster fine-tuning, 80% less memory
- Axolotl — Multi-GPU, supports LoRA/QLoRA/SFT/RLHF/GRPO
- LLaMA-Factory — GUI-first, 100+ model support, no-code fine-tuning
- TRL — Hugging Face RL-based alignment (GRPO, PPO, DPO)
- Torchtune — PyTorch-native, deep customization
- PEFT — Parameter-efficient methods (LoRA, QLoRA, adapters)
Guardrails / Safety
- NeMo Guardrails — NVIDIA’s open-source safety toolkit
- Guardrails AI — Output validation and structuring
- Llama Guard — LLM-based content classification
LLM Monitoring / Observability
- Langfuse — Open-source LLM observability (acquired by ClickHouse)
- LangSmith — Tracing, cost, and latency tracking
- Arize Phoenix — Open-source drift and bias monitoring
- Weights & Biases Weave — Multi-agent execution traces
- AgentOps — Agentic workflow monitoring
Evaluation and Testing
- RAGAS — RAG-specific evaluation (faithfulness, relevancy, recall)
- DeepEval — 14+ targeted metrics, Pytest-like LLM testing
- PromptFoo — Red teaming, security testing, and CI/CD integration
- OpenAI Evals — Open-source evaluation framework
