Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Google proposes continuous evaluation engineering methods to address the challenges of AI agent deployment environment assessment
ME News message, April 4 (UTC+8). Recently, GoogleCloudTech published a post stating that relying on manual chat and subjective impressions (i.e., “vibe checks”) to evaluate AI agents in production environments is unreliable and may lead to disasters. The article’s view is that, due to the probabilistic nature of generative AI, even small prompt changes or model weight changes can cause significant performance degradation. To address this issue, the article proposes an engineering approach of Continuous Evaluation (CE). This method distinguishes two modes for AI engineering: the exploration mode (lab) and the defense mode (factory). The exploration mode focuses on finding a model’s potential through a small number of examples and vibe checks. The defense mode, on the other hand, focuses on stability—ensuring the system meets Service Level Objective (SLO) targets through dataset-based evaluation, strict gating, and automated metrics. The article warns that many teams remain in the exploration mode for the long term. It also gives an example of a distributed multi-agent system (the course creator system) built based on Cloud Run and the Agent2Agent protocol, illustrating defense-mode practices for building reliable, scalable production-grade AI deployments by focusing on the separation of concerns principle and dedicated agents (such as researchers, judges, content builders, and coordinators). (Source: InFoQ)