Skip to main content

GPU Acceleration

All four MCP servers support transparent GPU acceleration for compute-intensive operations.

Architecture

The system uses a unified GPU management layer through the compute-core shared package:

┌─────────────────────────────────────────────────────────┐
│ MCP Servers │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Math MCP │ │Quantum │ │Molecular │ │Neural MCP│ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ └────────────┴─────┬──────┴────────────┘ │
│ │ │
│ ┌───────────▼───────────┐ │
│ │ compute-core │ │
│ │ (GPU abstraction) │ │
│ └───────────┬───────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌─────────┐ │
│ │ CuPy │ │ PyTorch │ │ NumPy │ │
│ │ (GPU) │ │(GPU/CPU) │ │ (CPU) │ │
│ └─────────┘ └──────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────┘

Automatic Backend Selection

The system automatically selects the best available backend:

  1. CuPy - If NVIDIA GPU with CUDA is available
  2. PyTorch CUDA - For neural network operations with GPU
  3. NumPy - Fallback for CPU-only systems
# The use_gpu parameter triggers automatic backend selection
result = solve_schrodinger(
potential=potential_id,
initial_state=wavefunction,
time_steps=1000,
dt=0.1,
use_gpu=True # Automatically uses best available backend
)

Performance Characteristics

Math MCP

OperationCPU (NumPy)GPU (CuPy)Speedup
FFT 1024x102445ms2ms22x
Matrix multiply 4096x40962.1s35ms60x
Linear solve 2048x2048850ms25ms34x

Quantum MCP

Grid SizeTime StepsCPU TimeGPU TimeSpeedup
25610008s0.3s27x
512100035s0.8s44x
10241000150s2.5s60x
256x256 (2D)100030min30s60x

Molecular MCP

ParticlesStepsCPU TimeGPU TimeSpeedup
1,00010,00010s1s10x
10,00010,000100s5s20x
100,00010,0001h30s120x

Neural MCP

ModelBatchCPU (per epoch)GPU (per epoch)Speedup
ResNet183245min30s90x
ResNet50322h2min60x
MobileNetV26430min20s90x

Memory Management

Automatic Memory Handling

The GPU manager automatically handles memory allocation and cleanup:

# Large computations are chunked automatically
result = matrix_multiply(
a=large_matrix_a, # 10000x10000
b=large_matrix_b, # 10000x10000
use_gpu=True
)
# GPU memory is released after computation

Memory Limits

Recommended GPU memory for different workloads:

WorkloadMinimum VRAMRecommended
Basic math operations2GB4GB
1D quantum simulations2GB4GB
2D quantum (512x512)4GB8GB
Molecular (100k particles)4GB8GB
Neural training (ResNet50)6GB11GB

Checking GPU Availability

Each MCP server provides GPU status through the info tool:

# Check GPU status
info = math_mcp.info(topic="overview")
# Returns: gpu_available: true, gpu_device: "NVIDIA RTX 3080"

info = quantum_mcp.info(topic="overview")
# Returns: cuda_available: true, cupy_version: "12.0"

Best Practices

1. Use GPU for Large Problems

GPU acceleration provides the most benefit for:

  • Matrix operations larger than 512x512
  • Quantum grids larger than 256 points
  • Molecular systems with more than 1000 particles
  • Neural networks (always use GPU when available)

2. Batch Operations

When possible, batch multiple operations:

# Less efficient: many small operations
for i in range(100):
result = matrix_multiply(small_a, small_b, use_gpu=True)

# More efficient: one large operation
result = matrix_multiply(large_a, large_b, use_gpu=True)

3. Grid Sizes for FFT

Use power-of-2 grid sizes for optimal FFT performance:

  • Good: 256, 512, 1024, 2048
  • Avoid: 300, 500, 1000

4. Mixed Precision (Neural MCP)

For training large models, mixed precision can double throughput:

# Future feature: mixed precision training
experiment = train_model(
model_id=model_id,
dataset_id=dataset_id,
use_gpu=True,
mixed_precision=True # FP16 for speed, FP32 for accuracy
)

Troubleshooting

Common Issues

ErrorCauseSolution
CUDAOutOfMemoryInsufficient VRAMReduce batch size or grid resolution
CUDADriverErrorDriver mismatchUpdate NVIDIA drivers
CuPyNotAvailableCuPy not installedInstall with pip install cupy-cuda12x
SlowPerformanceCPU fallback activeCheck GPU availability with info tool

Verifying GPU Usage

# Verify GPU is being used
import time

start = time.time()
result = matrix_multiply(a, b, use_gpu=True)
gpu_time = time.time() - start

start = time.time()
result = matrix_multiply(a, b, use_gpu=False)
cpu_time = time.time() - start

print(f"GPU: {gpu_time:.2f}s, CPU: {cpu_time:.2f}s")
# GPU should be significantly faster for large matrices

Supported Hardware

  • Compute Capability 6.0+ (Pascal and newer)
  • Tested: GTX 1080, RTX 2080, RTX 3080, RTX 4090, A100, H100

Requirements

  • CUDA 11.0 or newer
  • cuDNN 8.0 or newer (for Neural MCP)
  • CuPy 12.0 or newer
  • PyTorch 2.0 or newer (for Neural MCP)

Future Support

  • AMD ROCm support (planned)
  • Apple Metal support (planned)
  • Intel oneAPI support (under consideration)