Consider incorporating techniques like
paging for your VRAM. Inspired by virtual memory strategies, it can episodically load the required models and layers into memory, thus minimizing VRAM wastage due to fragmentation
discussion source. Systems such as
PagedAttention allow you to optimize VRAM by managing how your models consume memory during operations effectively.