Skip to content

Enhancing Agentic AI Performance and Security with Heterogeneous Compute

Overview

As the demand for large language models (LLMs) grows, ensuring high performance, low latency, and strong security becomes critical—especially for enterprise-grade GenAI applications. This solution leverages heterogeneous compute (GPUs + FPGAs) powered by AMD Alveo™ V80/U55C accelerator cards, alongside DHEYO AI's novel architecture.

The system boosts LLM memory handling, secure data sharing, vector search, and embedding tasks while maintaining enterprise-grade encryption and access control.

Key Benefits

Security & Privacy

  • Hardware-based Isolation
  • Symmetric Encryption Acceleration (AES-128/256)
  • Memory Collaboration via Secure FPGA-based Kernel APIs

Performance Acceleration

  • Vector Search: Up to 20–35× faster than Intel® Core™ i9
  • Data Ingestion & Embedding Acceleration
  • Memory Expansion for Large-Scale Retrieval

Memory-Aware LLM Serving

LLMs use multiple memory types:

  • Contextual Memory (short-term)
  • Persistent Memory (long-term)
  • External Memory (databases / knowledge bases)

Challenges Addressed

  • Scalability: Efficient storage of long-term memory
  • Privacy: Encryption & access control for sensitive data
  • Personalization: Secure multi-user memory collaboration
  • Fast Retrieval: Real-time access to external knowledge

How FPGAs Help GenAI

The AMD Alveo™ V80/U55C FPGA cards enable:

  • Low Latency via dedicated data paths
  • High Parallelism for vector & token operations
  • Flexible, Scalable Compute tuned per workload
  • Dynamic AES Encryption and Key Management

Security Architecture

AES Acceleration

  • Real-time encryption/decryption with custom LUT-based datapaths
  • Configurable AES-128 and AES-256 support
  • Dynamic key generation or secure key storage on-chip

Memory Collaboration

  • Encrypted Memory Vaults managed via FPGA-secure processors
  • Granular Access Control APIs for enterprise multi-user projects

Vector Search Acceleration

The DHEYO ANN accelerator implements an optimized IVF-PQ search method:

Parameter Range/Value
DB Size 1M – 1B vectors
Vector Dimension 128, 384, 768, 1024
Recall Target 0.7 – 0.95
Query Throughput (QPS) 10,000 – 30,000

Architecture Features

  • Custom Vector Processing Units (VPUs)
  • Hierarchical Memory (HBM + Block RAM)
  • Parallel Search Pipelines mapped to memory channels
  • Recall-Throughput Balance tuned via dataset profiling

Result: Up to 35× speedup over CPU baselines at Recall = 0.9.

Hardware Implementation

Card HBM Memory Bandwidth Memory Channels
Alveo™ U55C 16 GB HBM2 460 GB/s 32
Alveo™ V80 32 GB HBM2 819 GB/s 64

Building Blocks in DHEYO AI GenAI Stack

  • Data Ingestion
  • Embedding Acceleration
  • Vector Search
  • Memory Expansion
  • Security & Access Control

Disclaimers

Important Disclaimer

Performance claims are based on DHEYO AI internal tests. AMD has not independently verified them. Results may vary depending on hardware configurations, datasets, and system environments.