v0.1.0User GuideAI/ML Research

Research Agent Guide

Discover papers, identify research gaps, and generate novel ideas across top-tier ML venues — powered by agentic AI with model tiering and cost control.

Overview

Research Agent orchestrates a 5-phase pipeline: discover papers from multiple sources, triage by relevance, deep-scan PDFs, map the research landscape, and synthesize novel ideas and gaps. Each phase uses the right model tier to keep costs under ~$0.56/run.

🔍

Multi-Source Discovery

S2, arXiv, Perplexity, Google Scholar, OpenReview

🤖

Interview Scope

Conversational refinement before each run

📈

Landscape Mapping

Methods x Datasets coverage matrix

💡

Idea Synthesis

Gap-driven research ideas with feasibility scores

📑

Trace DAG

Ara-inspired exploration history with full reasoning

💰

Cost Control

Per-phase budgets, model tiering, authority enforcement

Interview

→

Discovery

→

Triage

→

Deep Scan

→

Landscape

→

Synthesis

Getting Started

Start the infrastructure
Run make dev-infra to start Postgres + Redis, then make migrate for the database schema.
Launch the backend
Run make dev-backend. The API starts at http://localhost:8000. Check /api/health to verify.
Launch the frontend
Run make dev-frontend. Open http://localhost:3000 to access the dashboard.
Configure API keys
Copy .env.example to .env and set your Anthropic API key. S2 and Perplexity keys are optional but improve results.

Full stack shortcut

Use make dev to start everything at once via Docker Compose (Postgres, Redis, backend, frontend).

Create a Pipeline

Click New Pipeline in the header to start. Enter a research query, select target venues, and configure depth and budget.

Field	Description
`Research Query`	Natural language description of your research area
`Target Venues`	Toggle: CVPR, ICLR, ICML, NeurIPS, ECCV, AAAI
`Include OpenReview`	Fetch accepted papers from conference proceedings
`Year From`	Only include papers published after this year
`Depth`	Quick (~2 min), Standard (~5 min), or Deep (~10 min)
`Budget Cap`	Max LLM cost for this run (default $1.00)

OpenReview toggle

Enable "Include OpenReview" when targeting ICLR, NeurIPS, ICML, or AAAI. It fetches acceptance decisions, review scores, and oral/spotlight/poster classifications directly from conference proceedings.

Interview Phase

After creation, the pipeline enters an interactive interview. The AI asks clarifying questions to refine your query, identify sub-topics, set exclusions, and focus the search.

Answer scope questions
The agent asks 3-5 questions about your area, methods, and constraints.
Review refined scope
The agent summarizes sub-topics, exclusions, and venue targeting. Approve or adjust.
Start the pipeline
Click "Run Pipeline" on the pipeline detail page to begin execution.

Run & Monitor

The pipeline detail page shows a live DAG of phase execution with real-time SSE streaming. Each phase node shows status, model used, token counts, and cost.

Pipeline States

Status	Meaning	Action
`pending`	Created, awaiting start	Click "Run Pipeline"
`interviewing`	Scope refinement in progress	Complete interview
`running`	Pipeline executing	Monitor or Abort
`gated`	Stopped at quality gate	Click "Resume" to retry
`completed`	All phases finished	View Results
`failed`	Error occurred	Check logs, Resume

Gate failures

Gates check paper counts and coverage quality between phases. If a gate fails, the pipeline pauses with status "gated". Click Resume to retry — the pipeline skips already-completed phases.

View Results

Completed pipelines produce three outputs accessible from the Results page:

💡

Research Ideas

Novel ideas with feasibility + impact scores, evaluated by LLM-as-judge

🔧

Research Gaps

Method gaps, dataset gaps, and understudied combinations

📚

Reading List

Top papers ranked by relevance and novelty, with arXiv links

📊

Landscape Map

Methods x Datasets coverage matrix showing research density

Paper Library

The Papers page shows all discovered papers grouped by pipeline run. Each paper displays venue, year, citation count, relevance score, and arXiv link. Papers from OpenReview also show acceptance type (oral/spotlight/poster) and average review score.

OpenReview Integration

The OpenReview connector fetches accepted papers from ICLR, NeurIPS, ICML, and AAAI directly from conference proceedings. Results are cached in Redis for 24 hours.

Supported Venues

Venue	OpenReview ID	Data Available
ICLR	`ICLR.cc`	Papers, reviews, decisions, keywords
NeurIPS	`NeurIPS.cc`	Papers, reviews, decisions, keywords
ICML	`ICML.cc`	Papers, reviews, decisions, keywords
AAAI	`AAAI.org`	Papers, decisions

Guest vs authenticated access

Guest mode works for all public venues. Set OPENREVIEW_USERNAME and OPENREVIEW_PASSWORD in your .env for higher rate limits.

Trace DAG

Every pipeline run produces an exploration trace — an Ara-inspired DAG that records every decision the agent made: queries issued, papers scored, gates passed or failed, dead ends, and pivots. Access it via the Trace tab on any pipeline detail page.

Event Types

Type	Color	Meaning
`QUERY_ISSUED`	Blue	Search or prompt sent to a source
`GATE_PASSED`	Green	Quality gate check succeeded
`GATE_FAILED`	Red	Quality gate check failed
`PAPER_SCORED`	Gray	Paper relevance scoring (collapsed in UI)
`DEAD_END`	Dark red	Search returned no results
`PIVOT`	Purple	Fundamental direction change

Click nodes for details

Click any node in the trace DAG to see its input data, output data, and reasoning in the detail panel. Use "Export JSONL" to download the full trace for offline analysis.

MCP Server

The conference papers MCP server lets you search OpenReview directly from Claude Code. It's auto-discovered via .mcp.json at the project root.

Available Tools

Tool	Description
`search_conference_papers`	Search accepted papers by venue, year, and keywords
`get_paper_details`	Full metadata for a paper by OpenReview ID
`get_paper_reviews`	Ratings, strengths, and weaknesses
`list_venues`	Supported venues and year ranges

# Test the MCP server locally
uv run python backend/mcp_server/conference_server.py

Settings

The Settings page (/settings) shows tool health status and configuration. All API keys and credentials are configured via environment variables in .env.

Variable	Required	Purpose
`ANTHROPIC_API_KEY`	Yes	Claude API for triage + synthesis
`S2_API_KEY`	No	Semantic Scholar (higher rate limits)
`PERPLEXITY_API_KEY`	No	Trend discovery
`OPENREVIEW_USERNAME`	No	OpenReview auth (guest mode default)
`OPENREVIEW_PASSWORD`	No	OpenReview auth
`ZOTERO_API_KEY`	No	Auto-sync reading lists to Zotero

Tips & Best Practices

Start with a focused query

"Representation learning for video understanding" works better than "deep learning". The interview phase helps refine, but a focused starting point gets better results.

Use Standard depth for first runs

Quick Scan misses niche papers. Deep Dive costs more but is worth it for comprehensive surveys. Standard is the best starting point.

Check the Trace tab on gated runs

When a pipeline is gated, the trace DAG shows exactly why — which gate failed, what the paper counts were, and whether a query broadening was attempted.

First OpenReview fetch is slow

The initial fetch for a venue+year (e.g., ICLR 2025 = 3,700 papers) takes ~40 seconds. Subsequent runs hit Redis cache (24h TTL) and return instantly.

Quick Reference

Page	URL	Purpose
Dashboard	`/`	Pipeline list + stats
New Pipeline	`/pipeline/new`	Create research pipeline
Pipeline Detail	`/pipeline/[id]`	Run, monitor, trace
Interview	`/pipeline/[id]/interview`	Scope refinement chat
Results	`/pipeline/[id]/results`	Ideas, gaps, reading list
Papers	`/papers`	Paper library by pipeline
Ideas	`/ideas`	All generated ideas
Gaps	`/gaps`	All identified gaps
Stats	`/stats`	Usage analytics
Settings	`/settings`	Tool health + config
Usage Guide	`/usage`	This page