Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Lightbits, innovative improvements in AI inference efficiency reduce cloud costs
Advancements in the tech industry are accelerating through innovations in the field of artificial intelligence (AI). Lightbits Labs recently announced a new architecture designed to address memory bottlenecks in large-scale AI inference. Developed in collaboration with ScaleFlux and FarmGPU, this architecture combines non-volatile memory for fast storage, GPU inference infrastructure, and Lightbits’ software to help AI systems more efficiently manage data caching generated during inference.
Against the backdrop of cloud providers feeling cost pressures when handling inference tasks, this release is expected to be good news. The high cost of GPUs has become a major part of operational expenses. To improve this situation, Lightbits has set a goal to optimize GPU utilization.
Lightbits’ new platform increases the number of requests a single GPU can handle, boosting inference efficiency. This directly translates into lower per-task processing costs, which is significant. According to Lightbits’ testing results, their system increased request handling capacity by three times on the same GPU while reducing power and infrastructure costs by 65%.
The core of this solution is “KV caching.” This cache stores intermediate vectors generated during inference, reusing previous calculations to avoid unnecessary computations. However, as model sizes grow, cache capacity is also expanding rapidly. Memory requirements are doubling each year, and long-term solutions will require multi-party efforts. To address this, Lightbits has introduced an innovative approach that predicts data movement and preloads necessary information into the GPU.
The LightInferra system manages and accelerates data movement across memory hierarchies, ensuring the GPU doesn’t have to wait for data. It can maintain smooth inference workflows within the limits of GPU memory capacity. Cloud service providers can use this design to optimize GPU utilization or increase overall processing capacity within existing infrastructure. This architecture is currently collaborating with NeoCloud and is scheduled for production deployment starting in July.