Transparent by Design

How We Predict
Your Percentile

In April 2026, our predictions were accurate for 97.55% of students. Here is the exact methodology — no black boxes, no marketing fluff.

97.55%

Prediction Accuracy

Shifts Covered

Pipeline Stages

The 7-Stage Pipeline

Each stage builds on the last. Together they transform raw crowdsourced data into a calibrated percentile range.

Step01

Crowdsourced Score Collection

Data Layer

Every prediction begins with real student data. When a student enters their marks and shift on our predictor, we silently capture that data point — once per device — into a live database. This crowdsourcing approach gives us a continuous, high-volume stream of self-reported scores across all sessions.

The dataset for April 2026 comprised thousands of submissions spread across 8 active shifts (Apr2S1 through Apr8S2), all collected after April 10, 2026 at 15:57 UTC — a cutoff chosen to exclude early test submissions and ensure only genuine post-exam entries are counted.

Step02

Score Smoothing & Spike Removal

Bias Correction

Self-reported data has a known systematic flaw: students tend to round their scores to the nearest multiple of 10 when they're uncertain. This creates artificial spikes at marks like 80, 90, 100, 110, etc. — inflating those counts and distorting the distribution.

We detect these spikes using a local windowed average: for each score value, we compute the average count of its 4 nearest neighbours (±1 and ±2 marks away), excluding the value itself. If a round-number mark's count significantly exceeds that local average, we replace it with the smoothed neighbour average. This eliminates the self-reporting bunching effect without removing genuine data.

Additionally, we cap the count of any single score at 5 submissions — preventing any one score from dominating the distribution due to viral sharing or coordinated submissions.

Step03

Deterministic Jitter for Tie-Breaking

Statistical Stability

After smoothing, many students share identical marks. When computing percentile ranks, ties create flat regions in the distribution that produce unstable interpolation. To resolve this, we apply a tiny deterministic jitter: each score receives a random offset in the range ±0.15 marks.

The randomness is seeded with a fixed value before each computation run, making the jitter fully reproducible across executions. The magnitude (±0.15) is deliberately small enough that it never crosses integer boundaries meaningfully — it only creates distinct ordering within tie groups, not different percentile outcomes.

Step04

Within-Session Percentile Ranking

Core Model

With jittered scores, we compute PERCENT_RANK() within each shift independently. A rank of 0 means the highest scorer in that session; a rank of 1 means the lowest. This produces a continuous percentile-rank distribution per shift.

Crucially, we do not cross-normalize between sessions at this stage. Each shift's rank curve represents that shift's difficulty independently. The mapping from raw marks to percentile rank is therefore pure and uncontaminated by inter-session assumptions — a deliberate design choice that differs from NTA's official normalization, which we account for in the final output layer.

Step05

Custom Cutoff Interpolation

Precision Output

Rather than reading off the nearest data point, we use linear interpolation between two bracketing observations for each target percentile. For a given target (say, the 95th percentile), we find the highest-ranked row at or below the target and the lowest-ranked row at or above it, then interpolate the corresponding marks value proportionally.

This interpolation is computed across 10 custom percentile bands (p90 through p99), each with a carefully tuned target_pr value. The bands are not evenly spaced — they are denser at the high end (99th, 98th percentile) where discrimination is most critical for top college admissions, and sparser below the 95th.

Step06

Range Output & Rank Mapping

Final Output

A student's input marks are located between two adjacent interpolated cutoff bands. The resulting percentile is expressed as a range (lower band to upper band) rather than a single number. This is intentional: it communicates honest uncertainty rather than false precision.

The percentile range is then mapped to an AIR (All India Rank) range using the official JEE Mains 2026 candidate count of approximately 15.5 lakh. Rank = (1 − percentile) × total candidates, computed for both the lower and upper percentile to produce a rank band. The midpoint of this band is passed to the College Predictor as the working rank.

Step07

Validation Against Official Results

Accuracy

After NTA released official April 2026 results, we validated our predictions against actual percentiles for all students who had submitted their marks to us. Our predicted percentile range contained the student's official percentile in 97.55% of cases across all shifts.

Accuracy was highest for the 92nd–98th percentile range (>98.5% containment) and slightly lower at the extreme ends — a known limitation of interpolation at distribution boundaries. The smoothing step was the single biggest contributor to accuracy improvement, reducing spike-induced errors by an estimated 2–3 percentage points compared to raw data.

Known Limitations

Self-Selection Bias

Students who use our predictor may skew toward higher scorers or more digitally active demographics. Our sample may not perfectly represent the full 15.5L candidate pool.

Distribution Tail Uncertainty

At extreme percentiles (below 85th or above 99.5th), sample density drops sharply. Predictions in these zones carry wider uncertainty than our mid-range estimates.

Pre-Official Key Volatility

Predictions made before NTA releases the official answer key are based on student-reported marks, which may have errors. Post-key predictions are significantly more stable.

This methodology applies to JEE Mains Session 2, April 2026. Future sessions may use an updated pipeline.

jeepredictor.in is an independent tool. We are not affiliated with NTA or any official examination body.

Known Limitations

Self-Selection Bias

Students who use our predictor may skew toward higher scorers or more digitally active demographics. Our sample may not perfectly represent the full 15.5L candidate pool.

Distribution Tail Uncertainty

At extreme percentiles (below 85th or above 99.5th), sample density drops sharply. Predictions in these zones carry wider uncertainty than our mid-range estimates.

Pre-Official Key Volatility

Predictions made before NTA releases the official answer key are based on student-reported marks, which may have errors. Post-key predictions are significantly more stable.

How We Predict Your Percentile

The 7-Stage Pipeline

Crowdsourced Score Collection

Score Smoothing & Spike Removal

Deterministic Jitter for Tie-Breaking

Within-Session Percentile Ranking

Custom Cutoff Interpolation

Range Output & Rank Mapping

Validation Against Official Results

Known Limitations

Self-Selection Bias

Distribution Tail Uncertainty

Pre-Official Key Volatility

How We Predict Your Percentile

The 7-Stage Pipeline

Crowdsourced Score Collection

Score Smoothing & Spike Removal

Deterministic Jitter for Tie-Breaking

Within-Session Percentile Ranking

Custom Cutoff Interpolation

Range Output & Rank Mapping

Validation Against Official Results

Known Limitations

Self-Selection Bias

Distribution Tail Uncertainty

Pre-Official Key Volatility

How We Predict
Your Percentile

How We Predict
Your Percentile