泄露的 AI 基准测试报告照片
一张略微倾斜的电脑显示器照片,屏幕上显示一份采用 LaTeX 风格排版的学术技术报告,包含基准测试柱状图和模型性能对比表。画面中可见 LCD 像素网格、轻微眩光,以及整洁的研究论文式布局。
模型: gpt-image-2分类: Infographic/Edu Visual风格: Photography语言: en
提示词
{ "type": "一台电脑显示器的照片,显示一份学术技术报告", "style": "略微倾斜的屏幕照片,可见摩尔纹,LCD像素网格,轻微眩光,LaTeX 文档排版,衬线字体", "document_header": { "left": "4 基准评测", "right": "{argument name=\"report title\" default=\"DeepSeek-V4 Technical Report\"}" }, "introductory_text": "概述 {argument name=\"main model name\" default=\"DeepSeek-V4\"} 与 {argument name=\"competitor model 1\" default=\"GPT-5.3\"}、{argument name=\"competitor model 2\" default=\"Claude Opus 4.6\"} 以及 {argument name=\"competitor model 3\" default=\"Gemini 3.1 Pro Preview\"} 的全面评测结果的段落。", "visualizations": { "legend": "5 个项目,颜色代码分别为:深蓝、灰色、浅灰、蓝色条纹、浅蓝", "bar_charts": { "count": 6, "labels": [ "MMLU-Pro (EM)", "GPQA-Diamond (Pass@1)", "AIME 2025 (Pass@1)", "LiveCodeBench (Pass@1-COT)", "SWE-bench Verified (Resolved)", "Tau-bench (Average)" ] }, "caption": "图 1 | 核心基准上的性能对比。DeepSeek-V4 在大多数基准上取得了最先进的结果。" }, "data_table": { "columns": [ "Benchmark", "{argument name=\"main model name\" default=\"DeepSeek-V4\"}", "{argument name=\"competitor model 1\" default=\"GPT-5.3\"}", "{argument name=\"competitor model 2\" default=\"Claude Opus 4.6\"}", "{argument name=\"competitor model 3\" default=\"Gemini 3.1 Pro Preview\"}", "GPT-4.1" ], "categories": { "count": 4, "rows": [ {"label": "通用", "icon": "globe/network", "sub_items": 3}, {"label": "推理与数学", "icon": "calculator/clipboard", "sub_items": 3}, {"label": "代码", "icon": "code brackets", "sub_items": 3}, {"label": "智能体", "icon": "robot face", "sub_items": 3} ] } } }