v0.1.0User GuideAI/ML Research

Research Agent Guide

Discover papers, identify research gaps, and generate novel ideas across top-tier ML venues — powered by agentic AI with model tiering and cost control.

Overview

Research Agent orchestrates a 5-phase pipeline: discover papers from multiple sources, triage by relevance, deep-scan PDFs, map the research landscape, and synthesize novel ideas and gaps. Each phase uses the right model tier to keep costs under ~$0.56/run.

🔍
Multi-Source Discovery
S2, arXiv, Perplexity, Google Scholar, OpenReview
🤖
Interview Scope
Conversational refinement before each run
📈
Landscape Mapping
Methods x Datasets coverage matrix
💡
Idea Synthesis
Gap-driven research ideas with feasibility scores
📑
Trace DAG
Ara-inspired exploration history with full reasoning
💰
Cost Control
Per-phase budgets, model tiering, authority enforcement
Interview
Discovery
Triage
Deep Scan
Landscape
Synthesis

Getting Started

  1. Start the infrastructure
    Run make dev-infra to start Postgres + Redis, then make migrate for the database schema.
  2. Launch the backend
    Run make dev-backend. The API starts at http://localhost:8000. Check /api/health to verify.
  3. Launch the frontend
    Run make dev-frontend. Open http://localhost:3000 to access the dashboard.
  4. Configure API keys
    Copy .env.example to .env and set your Anthropic API key. S2 and Perplexity keys are optional but improve results.
Full stack shortcut
Use make dev to start everything at once via Docker Compose (Postgres, Redis, backend, frontend).

Create a Pipeline

Click New Pipeline in the header to start. Enter a research query, select target venues, and configure depth and budget.

FieldDescription
Research QueryNatural language description of your research area
Target VenuesToggle: CVPR, ICLR, ICML, NeurIPS, ECCV, AAAI
Include OpenReviewFetch accepted papers from conference proceedings
Year FromOnly include papers published after this year
DepthQuick (~2 min), Standard (~5 min), or Deep (~10 min)
Budget CapMax LLM cost for this run (default $1.00)
OpenReview toggle
Enable "Include OpenReview" when targeting ICLR, NeurIPS, ICML, or AAAI. It fetches acceptance decisions, review scores, and oral/spotlight/poster classifications directly from conference proceedings.

Interview Phase

After creation, the pipeline enters an interactive interview. The AI asks clarifying questions to refine your query, identify sub-topics, set exclusions, and focus the search.

  1. Answer scope questions
    The agent asks 3-5 questions about your area, methods, and constraints.
  2. Review refined scope
    The agent summarizes sub-topics, exclusions, and venue targeting. Approve or adjust.
  3. Start the pipeline
    Click "Run Pipeline" on the pipeline detail page to begin execution.

Run & Monitor

The pipeline detail page shows a live DAG of phase execution with real-time SSE streaming. Each phase node shows status, model used, token counts, and cost.

Pipeline States

StatusMeaningAction
pendingCreated, awaiting startClick "Run Pipeline"
interviewingScope refinement in progressComplete interview
runningPipeline executingMonitor or Abort
gatedStopped at quality gateClick "Resume" to retry
completedAll phases finishedView Results
failedError occurredCheck logs, Resume
Gate failures
Gates check paper counts and coverage quality between phases. If a gate fails, the pipeline pauses with status "gated". Click Resume to retry — the pipeline skips already-completed phases.

View Results

Completed pipelines produce three outputs accessible from the Results page:

💡
Research Ideas
Novel ideas with feasibility + impact scores, evaluated by LLM-as-judge
🔧
Research Gaps
Method gaps, dataset gaps, and understudied combinations
📚
Reading List
Top papers ranked by relevance and novelty, with arXiv links
📊
Landscape Map
Methods x Datasets coverage matrix showing research density

Paper Library

The Papers page shows all discovered papers grouped by pipeline run. Each paper displays venue, year, citation count, relevance score, and arXiv link. Papers from OpenReview also show acceptance type (oral/spotlight/poster) and average review score.

OpenReview Integration

The OpenReview connector fetches accepted papers from ICLR, NeurIPS, ICML, and AAAI directly from conference proceedings. Results are cached in Redis for 24 hours.

Supported Venues

VenueOpenReview IDData Available
ICLRICLR.ccPapers, reviews, decisions, keywords
NeurIPSNeurIPS.ccPapers, reviews, decisions, keywords
ICMLICML.ccPapers, reviews, decisions, keywords
AAAIAAAI.orgPapers, decisions
Guest vs authenticated access
Guest mode works for all public venues. Set OPENREVIEW_USERNAME and OPENREVIEW_PASSWORD in your .env for higher rate limits.

Trace DAG

Every pipeline run produces an exploration trace — an Ara-inspired DAG that records every decision the agent made: queries issued, papers scored, gates passed or failed, dead ends, and pivots. Access it via the Trace tab on any pipeline detail page.

Event Types

TypeColorMeaning
QUERY_ISSUEDBlueSearch or prompt sent to a source
GATE_PASSEDGreenQuality gate check succeeded
GATE_FAILEDRedQuality gate check failed
PAPER_SCOREDGrayPaper relevance scoring (collapsed in UI)
DEAD_ENDDark redSearch returned no results
PIVOTPurpleFundamental direction change
Click nodes for details
Click any node in the trace DAG to see its input data, output data, and reasoning in the detail panel. Use "Export JSONL" to download the full trace for offline analysis.

MCP Server

The conference papers MCP server lets you search OpenReview directly from Claude Code. It's auto-discovered via .mcp.json at the project root.

Available Tools

ToolDescription
search_conference_papersSearch accepted papers by venue, year, and keywords
get_paper_detailsFull metadata for a paper by OpenReview ID
get_paper_reviewsRatings, strengths, and weaknesses
list_venuesSupported venues and year ranges
# Test the MCP server locally
uv run python backend/mcp_server/conference_server.py

Settings

The Settings page (/settings) shows tool health status and configuration. All API keys and credentials are configured via environment variables in .env.

VariableRequiredPurpose
ANTHROPIC_API_KEYYesClaude API for triage + synthesis
S2_API_KEYNoSemantic Scholar (higher rate limits)
PERPLEXITY_API_KEYNoTrend discovery
OPENREVIEW_USERNAMENoOpenReview auth (guest mode default)
OPENREVIEW_PASSWORDNoOpenReview auth
ZOTERO_API_KEYNoAuto-sync reading lists to Zotero

Tips & Best Practices

Start with a focused query
"Representation learning for video understanding" works better than "deep learning". The interview phase helps refine, but a focused starting point gets better results.
Use Standard depth for first runs
Quick Scan misses niche papers. Deep Dive costs more but is worth it for comprehensive surveys. Standard is the best starting point.
Check the Trace tab on gated runs
When a pipeline is gated, the trace DAG shows exactly why — which gate failed, what the paper counts were, and whether a query broadening was attempted.
First OpenReview fetch is slow
The initial fetch for a venue+year (e.g., ICLR 2025 = 3,700 papers) takes ~40 seconds. Subsequent runs hit Redis cache (24h TTL) and return instantly.

Quick Reference

PageURLPurpose
Dashboard/Pipeline list + stats
New Pipeline/pipeline/newCreate research pipeline
Pipeline Detail/pipeline/[id]Run, monitor, trace
Interview/pipeline/[id]/interviewScope refinement chat
Results/pipeline/[id]/resultsIdeas, gaps, reading list
Papers/papersPaper library by pipeline
Ideas/ideasAll generated ideas
Gaps/gapsAll identified gaps
Stats/statsUsage analytics
Settings/settingsTool health + config
Usage Guide/usageThis page