Enhancing Agentic AI Performance and Security with Heterogeneous Compute

Overview

As the demand for large language models (LLMs) grows, ensuring high performance, low latency, and strong security becomes critical—especially for enterprise-grade GenAI applications. This solution leverages heterogeneous compute (GPUs + FPGAs) powered by AMD Alveo™ V80/U55C accelerator cards, alongside DHEYO AI's novel architecture.

The system boosts LLM memory handling, secure data sharing, vector search, and embedding tasks while maintaining enterprise-grade encryption and access control.

Key Benefits

Security & Privacy

Hardware-based Isolation
Symmetric Encryption Acceleration (AES-128/256)
Memory Collaboration via Secure FPGA-based Kernel APIs

Performance Acceleration

Vector Search: Up to 20–35× faster than Intel® Core™ i9
Data Ingestion & Embedding Acceleration
Memory Expansion for Large-Scale Retrieval

Memory-Aware LLM Serving

LLMs use multiple memory types:

Contextual Memory (short-term)
Persistent Memory (long-term)
External Memory (databases / knowledge bases)

Challenges Addressed

Scalability: Efficient storage of long-term memory
Privacy: Encryption & access control for sensitive data
Personalization: Secure multi-user memory collaboration
Fast Retrieval: Real-time access to external knowledge

How FPGAs Help GenAI

The AMD Alveo™ V80/U55C FPGA cards enable:

Low Latency via dedicated data paths
High Parallelism for vector & token operations
Flexible, Scalable Compute tuned per workload
Dynamic AES Encryption and Key Management

Security Architecture

AES Acceleration

Real-time encryption/decryption with custom LUT-based datapaths
Configurable AES-128 and AES-256 support
Dynamic key generation or secure key storage on-chip

Memory Collaboration

Encrypted Memory Vaults managed via FPGA-secure processors
Granular Access Control APIs for enterprise multi-user projects

Vector Search Acceleration

The DHEYO ANN accelerator implements an optimized IVF-PQ search method:

Parameter	Range/Value
DB Size	1M – 1B vectors
Vector Dimension	128, 384, 768, 1024
Recall Target	0.7 – 0.95
Query Throughput (QPS)	10,000 – 30,000

Architecture Features

Custom Vector Processing Units (VPUs)
Hierarchical Memory (HBM + Block RAM)
Parallel Search Pipelines mapped to memory channels
Recall-Throughput Balance tuned via dataset profiling

Result: Up to 35× speedup over CPU baselines at Recall = 0.9.

Hardware Implementation

Card	HBM Memory	Bandwidth	Memory Channels
Alveo™ U55C	16 GB HBM2	460 GB/s	32
Alveo™ V80	32 GB HBM2	819 GB/s	64

Building Blocks in DHEYO AI GenAI Stack

Data Ingestion
Embedding Acceleration
Vector Search
Memory Expansion
Security & Access Control

Disclaimers

Important Disclaimer

Performance claims are based on DHEYO AI internal tests. AMD has not independently verified them. Results may vary depending on hardware configurations, datasets, and system environments.