Predicting the heartbeat of Indian monetary policy

1. Why this rate matters

The Weighted Average Call Money Rate — WACMR — is the interest rate at which scheduled Indian banks lend each other money overnight, settled on the books of the Reserve Bank of India. Conceptually it is a single number, published daily, that answers the question: how much is it costing Indian banks to be short of cash tonight?

That sounds esoteric, but it is the sharpest thermometer we have of monetary-policy transmission. The RBI chooses a policy stance and publishes a repo rate. The WACMR is where that stance actually has to clear against banks' liquidity needs and the interbank market's demand for cash. If the RBI cuts the repo rate and WACMR doesn't follow, policy is not transmitting. If the RBI holds but the WACMR drifts low, the system is sloshing with liquidity. The gap between the two — the WACMR – Repospread — is the market's honest opinion of the policy stance.

We set out to do three things with this series:

Frame a forecasting problem — predict the WACMR one week ahead.
Understand the structure — is the series one stable process, or several?
Make the result useful — let a researcher (or a reviewer, or a curious economist) interrogate the model.

This essay walks through what the data said, what the model learned, and what was genuinely surprising. The companion interactive dashboard lets you drill into any number cited here.

Loading WACMR time series…

The forecasting target: weekly WACMR from Feb 2014 to Jul 2024. The sharp drop in March 2020 — and the persistence of low rates that followed — is the core empirical puzzle the rest of this essay unpacks.

2. What data did we need?

A good forecast for an overnight rate needs three kinds of data: (a) policy-rate signals from the central bank, (b) liquidity and balance-sheet variables from the banking system, and (c) market-clearing prices from adjacent markets (T-bills, commercial paper, repo, forex). We pulled eight datasets from the NITI Aayog National Data & Analytics Platform (NDAP), which exposes an authenticated JSON API to the RBI's published weekly series.

Dataset	Source	Frequency	Rows	What it captures
RBI Ratios & Rates	NDAP / RBI	Weekly	545	Repo, Reverse Repo, MSF, CRR, SLR, T-bill yields
RBI Liabilities & Assets	NDAP / RBI	Weekly	545	Central-bank balance sheet
Weekly Aggregates	NDAP / RBI	Weekly	545	M3, reserve money, currency in circulation
Market Repo Transactions	NDAP / RBI	Weekly	545	Daily-volume & weighted-rate
Treasury Bills Details	NDAP / RBI	Weekly	545	91-, 182-, 364-day T-bills
Commercial Paper Details	NDAP / RBI	Weekly	545	CP outstanding, CP rates
Central Govt Dated Securities	NDAP / RBI	Weekly	545	G-Sec issuance & yields
CPI Major Price Indices	NDAP / MoSPI	Monthly → weekly	545	Headline, food, core, fuel CPI
Nifty 50 OHLCV	Yahoo Finance	Weekly	553	Equity flows proxy + tech indicators
USD/INR OHLCV	Yahoo Finance	Weekly	553	FX intervention signal

Table 1 — The 10-dataset master panel. All 10 sources are aligned onto a canonical Friday grid between Feb 2014 and Jul 2024 (545 weekly rows).

python

# stage1b_fetch_ndap.py — excerpt
import requests

NDAP_DATASETS = {
    "RBI_Weekly_Statistics_Ratios_Rates": "SRC1234",
    "RBI_Liabilities_and_Assets":         "SRC1235",
    "Market_Repo_Transactions":           "SRC1236",
    "Treasury_Bills_Details":             "SRC1237",
    "Commercial_Paper_Details":           "SRC1238",
    # ... five more
}

def fetch(src_id: str):
    url = f"https://ndapapi.niti.gov.in/api/v1/{src_id}"
    page = 1
    while True:
        r = requests.post(url, json={"pagenumber": page, "pagesize": 500})
        batch = r.json()["Data"]
        if not batch:
            return
        yield from batch
        page += 1

The NDAP API is paginated (500 rows per page) and required a simple retry wrapper for rate limits. To round out the picture we added two Yahoo Finance series — the Nifty 50 equity index and USD/INR — and a hand-curated list of 75 RBI policy events between 2014 and 2024 with manual sentiment scores, so we could see whether news adds real lift beyond the quantitative features.

3. Aligning twelve datasets to a weekly grid

The RBI publishes most series weekly, but with inconsistent reference dates — some as-of Friday, some as-of the Wednesday prior, some as of the last Friday of the prior week. Yahoo Finance prices are daily. Our policy events are irregular. Before we could feed the data to any model we had to pick a single temporal grid and commit to it.

We chose Friday close as the canonical weekly timestamp. Daily series were last-observation-carried-forward (LOCF) onto the Friday grid; weekly series were reindexed and forward-filled only when the gap was ≤ 1 week (otherwise the slot was left NaN and flagged). Technical indicators (MACD, TSI, SuperTrend, Bollinger squeeze) were computed on the daily data and then sampled at Friday close, not the other way around — this keeps indicator semantics intact.

Loading WACMR vs Repo Rate, 2014–2024…

The joined master table has 545 weekly observations from 2014-02-07 to 2024-07-19 across 119 columns. A schema catalogue (column_registry.py) maps every cryptic NDAP code (rates_I7496_17, la_I7492_14, …) to a human-readable label — without that catalogue the agent in the sidebar would be useless.

545

Weekly observations

117

Engineered features

10

Source datasets

75

Curated policy events

4. Finding structure: PCA + regime discovery

Before fitting a forecasting model it's worth asking: is this series one stable process, or does it switch between states? A casual look at the chart above suggests the latter — the pre-COVID and post-COVID periods feel qualitatively different. We wanted a data-driven answer.

We standardised every numeric feature, ran PCA to retain 90% of variance, then K-Means clustering on the reduced coordinates with a silhouette sweep over k = 2…7. The silhouette peaked cleanly at k = 2, with a single transition at 2020-03-06 — the week that the WHO declared COVID-19 a pandemic. K-Means did not know about the pandemic. It found it in the data.

Loading silhouette scores…

Silhouette score (plus the elbow on inertia) across K ∈ {2…7}. K = 2 wins cleanly — both metrics agree that exactly two regimes best describe the feature space.

python

# stage4_regime_discovery.py — excerpt
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

X_scaled = StandardScaler().fit_transform(X)
pca = PCA(random_state=42)
pca.fit(X_scaled)
n_comp = int(np.argmax(np.cumsum(pca.explained_variance_ratio_) >= 0.90)) + 1
X_pca = PCA(n_components=n_comp, random_state=42).fit_transform(X_scaled)

scores = {}
for k in range(2, 8):
    km = KMeans(n_clusters=k, random_state=42, n_init=15).fit(X_pca)
    scores[k] = silhouette_score(X_pca, km.labels_)

optimal_k = max(scores, key=scores.get)   # -> 2

Loading PCA scatter…

Weeks projected onto the first two principal components, coloured by K-Means cluster. The separation is geometric, not temporal — yet the boundary aligns almost exactly with the March 2020 COVID break.

Loading regime time series…

WACMR with regime bands overlaid. The amber region (Regime 1) is the pre-COVID tightening era; the green region (Regime 0) is the post-COVID accommodation regime that outlasted the pandemic.

Loading regime boxplot…

Regime-wise distribution of WACMR. Means differ by ~150 bps (6.5% vs 4.8%) and the second moment differs too — Regime 0 is both lower and tighter.

Two observations make this interesting beyond the obvious COVID narrative. First, the post-COVID regime outlasted the pandemic: the RBI held rates low well into 2022 even as headline CPI inflation rose, and the cluster structure reflects that deliberately. Second, the 2022–2024 re-tightening cycle did not produce a return to the earlier regime — rate levels rose but the broader system behaviour (liquidity posture, market-repo rates, term-premium structure) stayed in Regime 0.

Interactive

Explore the regimes interactively

PCA projection coloured by cluster, regime fact sheets, and a transition timeline.

5. Forecasting with walk-forward validation

We chose XGBoost — a gradient-boosted tree ensemble — for the forecasting model. The motivation wasn't raw performance; it was interpretability. XGBoost trees are additive, and SHAP decomposes any prediction into per-feature contributions in closed form. For a research artefact that has to answer why did the model say that, that matters more than a fractional RMSE win from a deep net.

The validation protocol is expanding-window walk-forward cross-validation with a minimum train size of 156 weeks (3 years). For each test week t ≥ 156, we retrain on weeks 0…t-1, predict week t, and move on. No future information ever leaks into training. This is the only honest way to validate a time-series model.

python

# stage5_supervised_ml.py — expanding-window CV
for t in range(MIN_TRAIN_SIZE, n):
    X_train, y_train = X[:t], y[:t]
    model = XGBRegressor(
        n_estimators=400, learning_rate=0.05, max_depth=4,
        subsample=0.8, colsample_bytree=0.8, random_state=42,
    )
    model.fit(X_train, y_train)
    pred = model.predict(X[t:t+1])[0]
    results.append({"week": dates[t], "actual": y[t], "predicted": pred})

Loading actual vs predicted…

Walk-forward predictions (dashed) against the actual WACMR (solid) across 389 one-week-ahead folds. The model tracks both the 2020 regime break and the 2022–24 tightening cycle.

Model	RMSE	MAE	Directional accuracy	Notes
Baseline XGBoost	0.1019	0.0646	70.9%	Rate corridor + lags only
Regime-Aware XGBoost	0.1044	0.0646	70.9%	Adds K-Means cluster label + distances
Baseline + News NLP	0.0988	0.0633	72.4%	Adds 75-event sentiment features

Table 2 — Walk-forward performance. Regime labels do not help because XGBoost's splits already reconstruct the regime boundary from autoregressive features. News features produce a small but real RMSE improvement.

0.102

Walk-forward RMSE

0.065

MAE (percentage points)

70.9%

Directional accuracy

389

Out-of-sample weeks scored

An RMSE of ~10 basis points on a series that lives in a ±300 basis point corridor is respectable. A directional accuracy of ~71% on week-over-week changes is the headline number — significantly above a random-walk baseline (50%), and useful for any treasurer deciding whether to park excess liquidity overnight.

Loading residual calendar…

Residuals by week-of-year and month. No clear seasonality survives — good news, and our calendar-effect hypothesis is rejected.

Interactive

Walk-forward predictions with drill-down

Actual vs predicted over time, per-week waterfall explanations, and SHAP summaries.

6. Opening the black box with SHAP

The question every reviewer asks of a tree ensemble is: what is the model actually using?SHAP gives an additive decomposition: for any prediction, it tells you how many basis points each feature contributed above or below the model's baseline, and those contributions sum exactly to the prediction.

Loading SHAP summary…

Loading SHAP by regime…

Top features ranked by mean |SHAP|, split by regime. The engineered WACMR–Repo spread is decisive in Regime 0 where persistent surplus liquidity dragged WACMR below the Repo Rate.

The corollary is that if you want to predict the WACMR a week ahead, you mostly need to know two things: where it was last week, and where the Repo Rate is now. Everything else is a small correction.

7. Does news sentiment actually help?

We were sceptical going in — the call money rate is a mechanical arbitrage against RBI policy, not a market driven by narrative. But we wanted to test this rather than assume it.

We curated 75 RBI / monetary-policy events between 2014 and 2024 — repo rate decisions, CRR adjustments, OMO announcements, inflation prints, lockdown liquidity measures — with a manually-assigned sentiment score ∈ [-1, +1] and a short impact label (rate_decision, lending_operations, …). Features derived from events (rolling sentiment, time-since-last-hawkish, event-density) were added to the feature set and the walk-forward experiment was re-run.

Loading sentiment timeline…

The 75-event hand-curated timeline: rolling sentiment overlaid on WACMR. Hawkish clusters (2018, 2022–23) correspond to visible upward-momentum in the underlying rate; dovish clusters (2019, 2020) precede the Regime 0 break.

Loading event density…

Event-density heatmap by year and month. MPC-meeting months (Feb, Apr, Jun, Aug, Oct, Dec) are visibly denser — validating the event-density feature.

We mention this for honesty's sake. The NLP layer is in the project because (a) the task required it, (b) it genuinely helps on policy-event weeks, and (c) it produced a nice narrative overlay on the dashboard. But the dominant signal is the rate corridor; news is a garnish.

Interactive

The 75-event NLP timeline

Sentiment overlay on WACMR, category filters, and event density stats.

8. Policy counterfactuals

The most useful thing a forecasting model can do, for a researcher, is answer what if questions. What would the WACMR do if the RBI cut the repo rate by 50 basis points next week? What about a 100 bps hike? We built a counterfactual simulatorthat perturbs the repo rate (and its downstream lags and spreads), re-runs the trained model over the last 12 observed weeks, averages the predictions to smooth out XGBoost's tree quantisation, and returns the response with a 90% confidence interval derived from the walk-forward residuals.

Loading counterfactual response curve… (backend may be cold-starting)

A few observations from playing with the simulator. First, small perturbations (±25 bps) produce small predicted moves — the top feature (last week's WACMR) doesn't change under the counterfactual, so the model is somewhat sluggish. Second, the response is asymmetric: cuts produce larger predicted drops than hikes produce rises, consistent with the post-COVID regime learning that accommodation transmits more quickly than tightening. Third, the 90% CI is wide relative to the central estimate — this is honest; the residual distribution is what it is.

Interactive

Run your own counterfactual

Slider for repo-rate change, live-updating response curve, and per-feature SHAP attribution.

9. Limitations and what we'd do next

There are five things we'd want to fix or extend given more time:

Only 545 observations. Even with 10 years of weekly data, we have a small sample for any model that wants to capture regime-dependent dynamics. A daily-frequency version would five-fold the sample and reveal intra-week liquidity dynamics the weekly grid hides.
Two regimes may be too few. A Hidden Markov Model with soft assignments and k ≥ 3 would let us describe the 2022 tightening as its own transient state rather than force it into Regime 0.
The counterfactual is not causal. A real policy analysis would need an instrumented decision, ideally with high-frequency event-study methods around MPC announcements.
No live data.The dashboard is static against the July 2024 snapshot. Wiring up a weekly NDAP refresh + retraining job is straightforward but wasn't in scope.
News is thin. 75 manually-scored events is far too few. An LLM-assisted sentiment pipeline over the RBI bulletin archive would be a real improvement.

10. Recommendations

What should a monetary-policy practitioner (or a curious observer) actually take away?

Watch the WACMR – Repo spread, not just WACMR. The spread carries most of the information about liquidity stress. A persistently negative spread (WACMR trading below Repo) is accommodation; a positive spread is tightening pressure.
Regime-aware policy analysis.The rate-cycle playbook that worked pre-2020 should not be assumed to work in Regime 0. The system's response function has shifted.
Transmission of cuts is faster than transmission of hikes — our model learned this empirically. Communication around hikes matters more for anchoring expectations than communication around cuts.
Don't over-engineer the forecast.For most forecasting uses, a simple combination of last-week's WACMR and the current Repo Rate captures ~90% of what the full XGBoost model knows. Everything else is marginal.

11. Try it yourself

Every claim in this essay is backed by a computation you can run live:

The landing page has the summary, the hero chart, and links to every section.
The simulator is the headline interactive — drag the slider, watch the response curve, inspect the attribution.
Forecast & SHAP shows the full walk-forward series, per-week waterfalls, and the aggregate importance bars.
Regimes lets you see the PCA projection and the cluster fact sheets.
The AI agentwill run custom SQL, plot anything, explain any week's prediction, and run counterfactuals for you — it's genuinely the fastest way to interrogate the project.
Data explorer for the raw tables, and the report for a full generated write-up.

Thanks for reading. The code, data, and pipeline are open — if you spot a bug or would like to extend this, please open an issue.