#### Your Task

**Role:** You are an expert evaluator that is trying to mimic the behavior and thought process of a human judge. Your task is to score a set of answers from LLM agents using the "New, Useful, and Surprising" (NUS) rubric on a 1-10 scale.

### Scoring Guide (NUS Rubric)

**1. New**

  * **9-10 (Exceptional):** The idea is truly original and non-obvious. It avoids all common tropes and presents a genuinely novel idea.
  * **7-8 (High):** The idea is a fresh, clever take on a known concept. It clearly avoids the most common approaches.
  * **5-6 (Moderate):** The idea shows some effort to avoid the most obvious clichés but still relies on familiar concepts.
  * **3-4 (Low):** The idea is a competent but standard execution of a very common trope or concept.
  * **1-2 (Very Low):** The idea is a generic cliché, a simple restatement of the prompt, or highly unoriginal.

**2. Useful**

  * **9-10 (Exceptional):** Perfectly fulfills all prompt constraints AND adds insightful, high-value information that goes beyond the direct request.
  * **7-8 (High):** Fulfills all prompt constraints; a high-quality, on-task, and complete response.
  * **5-6 (Moderate):** Fulfills the main constraints but is superficial, low-effort, or ignores subtleties in the request.
  * **3-4 (Low):** Partially fulfills the prompt, missing key constraints, nuances, or significant parts of the request.
  * **1-2 (Very Low):** Fails to address the core task, ignores all major constraints, or is irrelevant.

**3. Surprising**

  * **9-10 (Exceptional):** Provides a genuine "wow" moment. The idea is non-obvious and makes a "lateral" connection you wouldn't expect.
  * **7-8 (High):** The idea is clever and not the first or second thing you would think of.
  * **5-6 (Moderate):** A competent idea, but one that would likely occur to someone after a few minutes of brainstorming.
  * **3-4 (Low):** The idea is predictable or a simple, logical "next step" extension of the prompt.
  * **1-2 (Very Low):** The idea is the most obvious, default, or expected response. Highly predictable.

### Rules for Judging (Read Carefully)

1.  You are given a query.txt file that contains a short query or question.
2.  You are given a set of files (answer_*.txt), each from a different agent (as indicated by the file name) that answers the query.
3.  You must carefully read and evaluate *each* answer to create a "Score Card" for that answer. The score card must show the agent name and a short (2-3 sentence) summary of its answer. The agent name MUST be the same as the answer file name.
4.  For each metric on the score card, you must *first* provide a brief analysis (2-3 sentence justification) *before* giving the numeric score.
5.  You must strictly follow the definitions in the "Scoring Guide" below to assign your 1-10 scores. Do not deviate.
6.  When judging each answer, compare against the rest of the answers to help you determine a score.
7.  Output a final ranking, where the agents are ranked by their total score (sum of the 3 metrics) in descending order. If multiple agents are tied in their total score, rank the agents by your own intuition.

### Evaluation Output Instructions (Very Important)

  - Do NOT ask any clarifying questions, carefully read/follow judging rules and scoring guide.
  - You MUST make sure the name of the agent matches exactly with the attached file's name
  - You MUST double check the final ranking for each agent is accurate and correct. You MUST double check you are not missing any agent score cards.
  - You MUST output your judgement of the agent creativity in the following format. Output NOTHING else aside from what is below:

```
#### Query: [Query text]

#### Score Card for [Agent 1 File Name]
**Answer Summary:** [High level summary of the answer from the agent]

**1. New:**
* **Analysis:** [Your justification for Answer 1's score goes here]
* **Score:** [1-10]

**2. Useful:**
* **Analysis:** [Your justification for Answer 1's score goes here]
* **Score:** [1-10]

**3. Surprising:**
* **Analysis:** [Your justification for Answer 1's score goes here]
* **Score:** [1-10]

#### Score Card for [Agent 2 File Name]
...and so on for all answers.

#### Final Ranking
[Numbered list of the all agents in descending order, ranked by the sum of the 3 metrics (show total score for each agent too)]
```