Data Schema
Complete reference for the PaperStack v4 JSON response schema.
Data Schema
All PaperStack API responses follow the v4 schema. This page documents every field across the response wrapper, question objects, and related data types.
PaperResponse (Top-Level Wrapper)
Every data endpoint returns a PaperResponse object with the following structure:
{
"schema": "v4",
"exam": "neet",
"year": 2024,
"shift": "05may-s1",
"paper": null,
"subjects": ["physics", "chemistry", "biology"],
"total": 50,
"duration": 200,
"marksCorrect": 4,
"marksIncorrect": -1,
"marksUnanswered": 0,
"sections": { "...": "..." },
"scrapedAt": "2026-05-21T05:30:13.506Z",
"answerKeyFound": true,
"checksum": "59f420b3...",
"provenance": { "...": "..." },
"questions": ["..."] ,
"passages": []
}Fields
| Field | Type | Required | Description |
|---|---|---|---|
schema | string | ✅ | Schema version (v4) |
exam | string | ✅ | Exam code: neet, jeemain, jeeadv |
year | number | ✅ | Exam year (e.g. 2024) |
shift | string | ✅ | Shift in {dd}{mmm}-s{n} format (e.g. 05may-s1) |
paper | string | — | Paper code. Usually null for single-paper exams |
subjects | string[] | ✅ | Subjects in this response |
total | number | ✅ | Number of questions in this response |
duration | number | ✅ | Exam duration in minutes |
marksCorrect | number | ✅ | Marks per correct answer |
marksIncorrect | number | ✅ | Penalty per incorrect answer |
marksUnanswered | number | ✅ | Marks for unanswered questions (usually 0) |
sections | object | — | Section configuration (see below) |
scrapedAt | string | — | ISO 8601 timestamp of last scrape |
answerKeyFound | boolean | — | Whether the official answer key was available |
checksum | string | — | SHA-256 hex digest of this dataset |
provenance | object | — | Data origin metadata |
questions | Question[] | ✅ | Array of question objects |
passages | Passage[] | — | Array of comprehension passages |
sections
{
"a": {
"label": "section a",
"total": 100,
"required": 100,
"mandatory": true
},
"b": {
"label": "section b",
"total": 100,
"required": 100,
"mandatory": true
}
}| Field | Type | Description |
|---|---|---|
label | string | Section display name |
total | number | Total marks in this section |
required | number | Marks required to complete the section |
mandatory | boolean | Whether the section is mandatory |
provenance
{
"author": "Naman Dhakad",
"repo": "https://github.com/rankify-ai/PaperStack-API",
"license": "PolyForm-Noncommercial-1.0.0",
"pipelineVersion": "1.0.0",
"generatedAt": "2026-05-21T05:30:13.506Z"
}| Field | Type | Description |
|---|---|---|
author | string | Creator of the dataset |
repo | string | Source repository URL |
license | string | License identifier |
pipelineVersion | string | Extraction pipeline version |
generatedAt | string | ISO 8601 generation timestamp |
Question
Each question in the questions array has the following structure:
{
"id": "neet-2024-05may-s1-ph-001",
"number": 1,
"numberLabel": null,
"subject": "physics",
"topic": "current-electricity",
"section": "a",
"type": "mcq",
"text": "Question body with $\\LaTeX$ math...",
"textHi": null,
"options": ["Option A", "Option B", "Option C", "Option D"],
"answer": "(3)",
"answers": null,
"answerPrecision": null,
"marks": 4,
"negativeMarks": -1,
"passageId": null,
"hasDiagram": false,
"diagrams": null,
"difficulty": null,
"solution": null,
"solutionFormat": null,
"tags": ["battery", "internal-resistance"],
"revision": 1,
"source": "official-pdf",
"confidence": null
}Core Fields
| Field | Type | Required | Description |
|---|---|---|---|
id | string | ✅ | Globally unique ID in format {exam}-{year}-{shift}-{subject}-{number} |
number | number | ✅ | Question number (1-based). Global in paper.json, 1..N in subject files |
numberLabel | string | — | Optional label override for sub-parts |
subject | string | ✅ | physics, chemistry, biology, mathematics |
topic | string | ✅ | Normalized topic from controlled vocabulary |
section | string | — | Section identifier: a or b |
type | string | ✅ | Question type (see below) |
text | string | ✅ | Full question body. May contain $...$ LaTeX |
textHi | string | — | Hindi translation (if available) |
marks | number | ✅ | Marks awarded for correct answer |
negativeMarks | number | ✅ | Penalty for incorrect answer |
hasDiagram | boolean | ✅ | Whether this question references diagram images |
tags | string[] | — | Normalized topic tags for filtering |
revision | number | ✅ | Revision counter, incremented on corrections |
source | string | ✅ | official-pdf, answer-key, or reconstructed |
Answer Fields
| Field | Type | MCQ | MSQ | NAT | Assertion-Reason |
|---|---|---|---|---|---|
answer | string | "(3)" | "" | "9.301" | "(0)"-"(3)" |
answers | number[] | null | [1, 3] | null | null |
answerPrecision | object | null | null | { type, min, max, unit } | null |
options | string[] | 3-5 options | 4-6 options | null | null |
Answer format details:
| Type | Answer Format | Example |
|---|---|---|
| MCQ | Zero-indexed index in parentheses | "(2)" means the 3rd option (0-indexed) |
| MSQ | Array of zero-indexed indices in answers | [1, 3] means options 2 and 4 |
| NAT | Numeric string in answer | "9.301" |
| Assertion-Reason | Index in parentheses: "(0)" for A, "(1)" for B, etc. | "(2)" |
Diagram Fields
| Field | Type | Description |
|---|---|---|
hasDiagram | boolean | Whether diagrams exist for this question |
diagrams | Diagram[] | Array of diagram references (see below) |
"diagrams": [
{
"file": "diagrams/q013-img-7.jpeg",
"label": "fig. 1",
"caption": null
}
]| Field | Type | Description |
|---|---|---|
file | string | Relative path to image within the paper's diagrams/ directory |
label | string | Display label (e.g. fig. 1) |
caption | string | Optional diagram description |
To fetch the image: GET /neet/{year}/{shift}/diagrams/{file} (requires auth).
Passage
Some questions belong to a comprehension passage. Passages are stored in a separate passages array at the top level:
{
"id": "passage-1",
"text": "Passage body text with $\\LaTeX$...",
"textHi": null,
"diagrams": null,
"questions": ["neet-2024-05may-s1-ph-010", "neet-2024-05may-s1-ph-011"]
}A question links to its passage via the passageId field.
Question Types
MCQ (Multiple Choice Question)
Single correct answer from 3–5 options.
{
"type": "mcq",
"options": ["1/10N", "1/100(N + 1)", "100N", "10(N + 1)"],
"answer": "(2)",
"answers": null,
"answerPrecision": null,
"negativeMarks": -1
}MSQ (Multiple Select Question)
Multiple correct answers selected from 4–6 options.
{
"type": "msq",
"options": [
"A-III, B-IV, C-I, D-II",
"A-IV, B-I, C-III, D-II",
"A-II, B-I, C-IV, D-III",
"A-III, B-I, C-IV, D-II"
],
"answer": "",
"answers": [1, 3],
"negativeMarks": -1
}NAT (Numerical Answer Type)
A numeric value is required. No options provided.
{
"type": "nat",
"options": null,
"answer": "9.301",
"answers": null,
"answerPrecision": {
"type": "range",
"min": 9.301,
"max": 9.301,
"unit": ""
},
"negativeMarks": 0
}Assertion-Reason
Consists of an Assertion (A) and Reason (R). Answer selects the correct relationship.
{
"type": "assertion-reason",
"options": null,
"answer": "(2)",
"answers": null,
"negativeMarks": -1
}Answer mapping for assertion-reason:
| Answer | Meaning |
|---|---|
(0) | Both A and R are true and R is the correct explanation of A |
(1) | Both A and R are true but R is NOT the correct explanation of A |
(2) | A is true but R is false |
(3) | A is false but R is true |
Question ID Format
{exam}-{year}-{shift}-{subject-code}-{zero-padded-number}| Component | Description | Example |
|---|---|---|
exam | Exam code | neet |
year | 4-digit year | 2024 |
shift | Full shift name | 05may-s1 |
subject-code | Two-letter abbreviation | ph, ch, bi, ma |
number | Zero-padded 3-digit number | 001 |
Subject codes: ph (physics), ch (chemistry), bi (biology), ma (mathematics)
Example: neet-2024-05may-s1-ph-001