DeepSeek releases ‘sparse attention’ model that cu

R

Russell Brandom

Guest
Researchers at DeepSeek released a new experimental model designed to have dramatically lower inference costs when used in long-context operations.