AI Resources

Generative AI Systems Stack

LLMs / Foundation Models

AI Frameworks

AI Coding Assistants

Text Embeddings

Vector Databases

  • Pinecone — Fully managed, serverless
  • Weaviate — Open-source with hybrid search
  • Qdrant — High-performance, Rust-based
  • Chroma — Lightweight, developer-friendly
  • Milvus — Billion-scale vector search
  • pgvector — PostgreSQL extension

RAG (Retrieval-Augmented Generation)

  • Agentic RAG — LLM-driven query decomposition
  • HiFi-RAG — Multi-stage hierarchical filtering
  • Bidirectional RAG — Controlled write-back with grounding checks
  • LlamaIndex Workflows — Event-driven RAG pipelines
  • LangChain LCEL — Chain-based RAG orchestration

AI Agents

  • LangGraph — Stateful multi-actor workflows
  • CrewAI — Role-based multi-agent collaboration
  • AutoGen — Microsoft’s multi-agent conversation framework
  • AutoGPT — Autonomous long-running agents

LLM Serving / Inference

  • vLLM — PagedAttention, high-throughput serving
  • TGI — Hugging Face’s production inference server
  • Ollama — Local model serving
  • TensorRT-LLM — NVIDIA-optimized inference
  • SGLang — Structured generation for constrained outputs

Fine-Tuning

  • Axolotl — Multi-GPU, supports LoRA/QLoRA/SFT/RLHF
  • Unsloth — 2-5x faster fine-tuning, 80% less memory
  • Torchtune — Deep customization and scalability
  • PEFT — Hugging Face parameter-efficient fine-tuning

LLM Monitoring / Observability

Evaluation and Testing

  • RAGAS — RAG-specific evaluation (faithfulness, relevancy, recall)
  • DeepEval — 60+ metrics, Pytest-like LLM testing
  • PromptFoo — Prompt A/B testing via YAML/CLI
  • OpenAI Evals — Open-source evaluation framework