Enhancing Agentic AI Performance and Security with Heterogeneous Compute
Overview
As the demand for large language models (LLMs) grows, ensuring high performance, low latency, and strong security becomes critical—especially for enterprise-grade GenAI applications. This solution leverages heterogeneous compute (GPUs + FPGAs) powered by AMD Alveo™ V80/U55C accelerator cards, alongside DHEYO AI's novel architecture.
The system boosts LLM memory handling, secure data sharing, vector search, and embedding tasks while maintaining enterprise-grade encryption and access control.
Key Benefits
Security & Privacy
- Hardware-based Isolation
- Symmetric Encryption Acceleration (AES-128/256)
- Memory Collaboration via Secure FPGA-based Kernel APIs
Performance Acceleration
- Vector Search: Up to 20–35× faster than Intel® Core™ i9
- Data Ingestion & Embedding Acceleration
- Memory Expansion for Large-Scale Retrieval
Memory-Aware LLM Serving
LLMs use multiple memory types:
- Contextual Memory (short-term)
- Persistent Memory (long-term)
- External Memory (databases / knowledge bases)
Challenges Addressed
- Scalability: Efficient storage of long-term memory
- Privacy: Encryption & access control for sensitive data
- Personalization: Secure multi-user memory collaboration
- Fast Retrieval: Real-time access to external knowledge
How FPGAs Help GenAI
The AMD Alveo™ V80/U55C FPGA cards enable:
- Low Latency via dedicated data paths
- High Parallelism for vector & token operations
- Flexible, Scalable Compute tuned per workload
- Dynamic AES Encryption and Key Management
Security Architecture
AES Acceleration
- Real-time encryption/decryption with custom LUT-based datapaths
- Configurable AES-128 and AES-256 support
- Dynamic key generation or secure key storage on-chip
Memory Collaboration
- Encrypted Memory Vaults managed via FPGA-secure processors
- Granular Access Control APIs for enterprise multi-user projects
Vector Search Acceleration
The DHEYO ANN accelerator implements an optimized IVF-PQ search method:
| Parameter | Range/Value |
|---|---|
| DB Size | 1M – 1B vectors |
| Vector Dimension | 128, 384, 768, 1024 |
| Recall Target | 0.7 – 0.95 |
| Query Throughput (QPS) | 10,000 – 30,000 |
Architecture Features
- Custom Vector Processing Units (VPUs)
- Hierarchical Memory (HBM + Block RAM)
- Parallel Search Pipelines mapped to memory channels
- Recall-Throughput Balance tuned via dataset profiling
Result: Up to 35× speedup over CPU baselines at Recall = 0.9.
Hardware Implementation
| Card | HBM Memory | Bandwidth | Memory Channels |
|---|---|---|---|
| Alveo™ U55C | 16 GB HBM2 | 460 GB/s | 32 |
| Alveo™ V80 | 32 GB HBM2 | 819 GB/s | 64 |
Building Blocks in DHEYO AI GenAI Stack
- Data Ingestion
- Embedding Acceleration
- Vector Search
- Memory Expansion
- Security & Access Control
Disclaimers
Important Disclaimer
Performance claims are based on DHEYO AI internal tests. AMD has not independently verified them. Results may vary depending on hardware configurations, datasets, and system environments.