1. Why this rate matters
The Weighted Average Call Money Rate — WACMR — is the interest rate at which scheduled Indian banks lend each other money overnight, settled on the books of the Reserve Bank of India. Conceptually it is a single number, published daily, that answers the question: how much is it costing Indian banks to be short of cash tonight?
That sounds esoteric, but it is the sharpest thermometer we have of monetary-policy transmission. The RBI chooses a policy stance and publishes a repo rate. The WACMR is where that stance actually has to clear against banks' liquidity needs and the interbank market's demand for cash. If the RBI cuts the repo rate and WACMR doesn't follow, policy is not transmitting. If the RBI holds but the WACMR drifts low, the system is sloshing with liquidity. The gap between the two — the WACMR – Repospread — is the market's honest opinion of the policy stance.
We set out to do three things with this series:
- Frame a forecasting problem — predict the WACMR one week ahead.
- Understand the structure — is the series one stable process, or several?
- Make the result useful — let a researcher (or a reviewer, or a curious economist) interrogate the model.
This essay walks through what the data said, what the model learned, and what was genuinely surprising. The companion interactive dashboard lets you drill into any number cited here.
2. What data did we need?
A good forecast for an overnight rate needs three kinds of data: (a) policy-rate signals from the central bank, (b) liquidity and balance-sheet variables from the banking system, and (c) market-clearing prices from adjacent markets (T-bills, commercial paper, repo, forex). We pulled eight datasets from the NITI Aayog National Data & Analytics Platform (NDAP), which exposes an authenticated JSON API to the RBI's published weekly series.
| Dataset | Source | Frequency | Rows | What it captures |
|---|---|---|---|---|
| RBI Ratios & Rates | NDAP / RBI | Weekly | 545 | Repo, Reverse Repo, MSF, CRR, SLR, T-bill yields |
| RBI Liabilities & Assets | NDAP / RBI | Weekly | 545 | Central-bank balance sheet |
| Weekly Aggregates | NDAP / RBI | Weekly | 545 | M3, reserve money, currency in circulation |
| Market Repo Transactions | NDAP / RBI | Weekly | 545 | Daily-volume & weighted-rate |
| Treasury Bills Details | NDAP / RBI | Weekly | 545 | 91-, 182-, 364-day T-bills |
| Commercial Paper Details | NDAP / RBI | Weekly | 545 | CP outstanding, CP rates |
| Central Govt Dated Securities | NDAP / RBI | Weekly | 545 | G-Sec issuance & yields |
| CPI Major Price Indices | NDAP / MoSPI | Monthly → weekly | 545 | Headline, food, core, fuel CPI |
| Nifty 50 OHLCV | Yahoo Finance | Weekly | 553 | Equity flows proxy + tech indicators |
| USD/INR OHLCV | Yahoo Finance | Weekly | 553 | FX intervention signal |
# stage1b_fetch_ndap.py — excerpt
import requests
NDAP_DATASETS = {
"RBI_Weekly_Statistics_Ratios_Rates": "SRC1234",
"RBI_Liabilities_and_Assets": "SRC1235",
"Market_Repo_Transactions": "SRC1236",
"Treasury_Bills_Details": "SRC1237",
"Commercial_Paper_Details": "SRC1238",
# ... five more
}
def fetch(src_id: str):
url = f"https://ndapapi.niti.gov.in/api/v1/{src_id}"
page = 1
while True:
r = requests.post(url, json={"pagenumber": page, "pagesize": 500})
batch = r.json()["Data"]
if not batch:
return
yield from batch
page += 1The NDAP API is paginated (500 rows per page) and required a simple retry wrapper for rate limits. To round out the picture we added two Yahoo Finance series — the Nifty 50 equity index and USD/INR — and a hand-curated list of 75 RBI policy events between 2014 and 2024 with manual sentiment scores, so we could see whether news adds real lift beyond the quantitative features.
3. Aligning twelve datasets to a weekly grid
The RBI publishes most series weekly, but with inconsistent reference dates — some as-of Friday, some as-of the Wednesday prior, some as of the last Friday of the prior week. Yahoo Finance prices are daily. Our policy events are irregular. Before we could feed the data to any model we had to pick a single temporal grid and commit to it.
We chose Friday close as the canonical weekly timestamp. Daily series were last-observation-carried-forward (LOCF) onto the Friday grid; weekly series were reindexed and forward-filled only when the gap was ≤ 1 week (otherwise the slot was left NaN and flagged). Technical indicators (MACD, TSI, SuperTrend, Bollinger squeeze) were computed on the daily data and then sampled at Friday close, not the other way around — this keeps indicator semantics intact.
The joined master table has 545 weekly observations from 2014-02-07 to 2024-07-19 across 119 columns. A schema catalogue (column_registry.py) maps every cryptic NDAP code (rates_I7496_17, la_I7492_14, …) to a human-readable label — without that catalogue the agent in the sidebar would be useless.
4. Finding structure: PCA + regime discovery
Before fitting a forecasting model it's worth asking: is this series one stable process, or does it switch between states? A casual look at the chart above suggests the latter — the pre-COVID and post-COVID periods feel qualitatively different. We wanted a data-driven answer.
We standardised every numeric feature, ran PCA to retain 90% of variance, then K-Means clustering on the reduced coordinates with a silhouette sweep over k = 2…7. The silhouette peaked cleanly at k = 2, with a single transition at 2020-03-06 — the week that the WHO declared COVID-19 a pandemic. K-Means did not know about the pandemic. It found it in the data.
# stage4_regime_discovery.py — excerpt
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
X_scaled = StandardScaler().fit_transform(X)
pca = PCA(random_state=42)
pca.fit(X_scaled)
n_comp = int(np.argmax(np.cumsum(pca.explained_variance_ratio_) >= 0.90)) + 1
X_pca = PCA(n_components=n_comp, random_state=42).fit_transform(X_scaled)
scores = {}
for k in range(2, 8):
km = KMeans(n_clusters=k, random_state=42, n_init=15).fit(X_pca)
scores[k] = silhouette_score(X_pca, km.labels_)
optimal_k = max(scores, key=scores.get) # -> 2Two observations make this interesting beyond the obvious COVID narrative. First, the post-COVID regime outlasted the pandemic: the RBI held rates low well into 2022 even as headline CPI inflation rose, and the cluster structure reflects that deliberately. Second, the 2022–2024 re-tightening cycle did not produce a return to the earlier regime — rate levels rose but the broader system behaviour (liquidity posture, market-repo rates, term-premium structure) stayed in Regime 0.
PCA projection coloured by cluster, regime fact sheets, and a transition timeline.
5. Forecasting with walk-forward validation
We chose XGBoost — a gradient-boosted tree ensemble — for the forecasting model. The motivation wasn't raw performance; it was interpretability. XGBoost trees are additive, and SHAP decomposes any prediction into per-feature contributions in closed form. For a research artefact that has to answer why did the model say that, that matters more than a fractional RMSE win from a deep net.
The validation protocol is expanding-window walk-forward cross-validation with a minimum train size of 156 weeks (3 years). For each test week t ≥ 156, we retrain on weeks 0…t-1, predict week t, and move on. No future information ever leaks into training. This is the only honest way to validate a time-series model.
# stage5_supervised_ml.py — expanding-window CV
for t in range(MIN_TRAIN_SIZE, n):
X_train, y_train = X[:t], y[:t]
model = XGBRegressor(
n_estimators=400, learning_rate=0.05, max_depth=4,
subsample=0.8, colsample_bytree=0.8, random_state=42,
)
model.fit(X_train, y_train)
pred = model.predict(X[t:t+1])[0]
results.append({"week": dates[t], "actual": y[t], "predicted": pred})| Model | RMSE | MAE | Directional accuracy | Notes |
|---|---|---|---|---|
| Baseline XGBoost | 0.1019 | 0.0646 | 70.9% | Rate corridor + lags only |
| Regime-Aware XGBoost | 0.1044 | 0.0646 | 70.9% | Adds K-Means cluster label + distances |
| Baseline + News NLP | 0.0988 | 0.0633 | 72.4% | Adds 75-event sentiment features |
An RMSE of ~10 basis points on a series that lives in a ±300 basis point corridor is respectable. A directional accuracy of ~71% on week-over-week changes is the headline number — significantly above a random-walk baseline (50%), and useful for any treasurer deciding whether to park excess liquidity overnight.
Actual vs predicted over time, per-week waterfall explanations, and SHAP summaries.
6. Opening the black box with SHAP
The question every reviewer asks of a tree ensemble is: what is the model actually using?SHAP gives an additive decomposition: for any prediction, it tells you how many basis points each feature contributed above or below the model's baseline, and those contributions sum exactly to the prediction.
The corollary is that if you want to predict the WACMR a week ahead, you mostly need to know two things: where it was last week, and where the Repo Rate is now. Everything else is a small correction.
7. Does news sentiment actually help?
We were sceptical going in — the call money rate is a mechanical arbitrage against RBI policy, not a market driven by narrative. But we wanted to test this rather than assume it.
We curated 75 RBI / monetary-policy events between 2014 and 2024 — repo rate decisions, CRR adjustments, OMO announcements, inflation prints, lockdown liquidity measures — with a manually-assigned sentiment score ∈ [-1, +1] and a short impact label (rate_decision, lending_operations, …). Features derived from events (rolling sentiment, time-since-last-hawkish, event-density) were added to the feature set and the walk-forward experiment was re-run.
We mention this for honesty's sake. The NLP layer is in the project because (a) the task required it, (b) it genuinely helps on policy-event weeks, and (c) it produced a nice narrative overlay on the dashboard. But the dominant signal is the rate corridor; news is a garnish.
Sentiment overlay on WACMR, category filters, and event density stats.
8. Policy counterfactuals
The most useful thing a forecasting model can do, for a researcher, is answer what if questions. What would the WACMR do if the RBI cut the repo rate by 50 basis points next week? What about a 100 bps hike? We built a counterfactual simulatorthat perturbs the repo rate (and its downstream lags and spreads), re-runs the trained model over the last 12 observed weeks, averages the predictions to smooth out XGBoost's tree quantisation, and returns the response with a 90% confidence interval derived from the walk-forward residuals.
A few observations from playing with the simulator. First, small perturbations (±25 bps) produce small predicted moves — the top feature (last week's WACMR) doesn't change under the counterfactual, so the model is somewhat sluggish. Second, the response is asymmetric: cuts produce larger predicted drops than hikes produce rises, consistent with the post-COVID regime learning that accommodation transmits more quickly than tightening. Third, the 90% CI is wide relative to the central estimate — this is honest; the residual distribution is what it is.
Slider for repo-rate change, live-updating response curve, and per-feature SHAP attribution.
9. Limitations and what we'd do next
There are five things we'd want to fix or extend given more time:
- Only 545 observations. Even with 10 years of weekly data, we have a small sample for any model that wants to capture regime-dependent dynamics. A daily-frequency version would five-fold the sample and reveal intra-week liquidity dynamics the weekly grid hides.
- Two regimes may be too few. A Hidden Markov Model with soft assignments and k ≥ 3 would let us describe the 2022 tightening as its own transient state rather than force it into Regime 0.
- The counterfactual is not causal. A real policy analysis would need an instrumented decision, ideally with high-frequency event-study methods around MPC announcements.
- No live data.The dashboard is static against the July 2024 snapshot. Wiring up a weekly NDAP refresh + retraining job is straightforward but wasn't in scope.
- News is thin. 75 manually-scored events is far too few. An LLM-assisted sentiment pipeline over the RBI bulletin archive would be a real improvement.
10. Recommendations
What should a monetary-policy practitioner (or a curious observer) actually take away?
- Watch the
WACMR – Repospread, not just WACMR. The spread carries most of the information about liquidity stress. A persistently negative spread (WACMR trading below Repo) is accommodation; a positive spread is tightening pressure. - Regime-aware policy analysis.The rate-cycle playbook that worked pre-2020 should not be assumed to work in Regime 0. The system's response function has shifted.
- Transmission of cuts is faster than transmission of hikes — our model learned this empirically. Communication around hikes matters more for anchoring expectations than communication around cuts.
- Don't over-engineer the forecast.For most forecasting uses, a simple combination of last-week's WACMR and the current Repo Rate captures ~90% of what the full XGBoost model knows. Everything else is marginal.
11. Try it yourself
Every claim in this essay is backed by a computation you can run live:
- The landing page has the summary, the hero chart, and links to every section.
- The simulator is the headline interactive — drag the slider, watch the response curve, inspect the attribution.
- Forecast & SHAP shows the full walk-forward series, per-week waterfalls, and the aggregate importance bars.
- Regimes lets you see the PCA projection and the cluster fact sheets.
- The AI agentwill run custom SQL, plot anything, explain any week's prediction, and run counterfactuals for you — it's genuinely the fastest way to interrogate the project.
- Data explorer for the raw tables, and the report for a full generated write-up.
Thanks for reading. The code, data, and pipeline are open — if you spot a bug or would like to extend this, please open an issue.