Verified Real Data
Dataset →
Initializing pipeline…
Data Points
 
Train / Test
 
Models Trained
 
Best Model
 
Best RMSE
 
Best R²
 
MAPE
 
✓ Z-score Normalization ✓ Lag Features (1,2,3,5) ✓ Rolling Mean/Std (3,5) ✓ Quadratic Trend Term ✓ 80/20 Temporal Split ✓ Residual CI (±1σ) ✓ 6 Models Compared ✓ Data Cross-Verified vs World Bank / BLS / GCP
01 — Historical Data & Train/Test Split
Full Historical Series · Training vs Test Partition
ⓘ Legend
02 — Future Forecast with Confidence Intervals
Multi-Model Forecast · Best ML + ARIMA + Exp. Smoothing
Horizon:
Shaded band = ±1σ confidence interval · Dashed = ARIMA · Dotted = Exp. Smoothing
ARIMA(1,1,1) Forecast Detail
03 — Model Evaluation & Residual Analysis
Prediction vs Actual — Test Set (Best Model)
Residuals over Time
04 — Model Comparison & Leaderboard
RMSE Comparison (lower = better)
■ Best model ■ Other models  ·  Values shown on bars
R² Score (higher = better)
■ ≥0.7 Good fit ■ 0.4–0.7 Moderate ■ 0–0.4 Weak ■ <0 Worse than mean
All Models — Full Accuracy Metrics
RankModelRMSEMAEMAPE %Type
05 — Feature Importance & Error Distribution
Feature Importance — Random Forest (Permutation)
Prediction Error Distribution (Histogram)
X = error (Actual − Predicted) in dataset units  ·  Y = number of test predictions in that error range  ·  ■ green = underestimated   ■ red = overestimated  ·  Ideal: tallest bar near 0
06 — Rolling Statistics & Autocorrelation
Rolling Mean & ±1σ Band (window=10)
Autocorrelation Function (ACF) — Lags 1–15
Orange lines = 95% confidence bounds (±1.96/√n) · ● = statistically significant lag
07 — Time-Series Decomposition (Additive)
Trend · Seasonal · Residual Components
📈 Trend Component
🔄 Seasonal (period=10)
⚡ Residual
08 — Forecast Values — All Models
Numeric Forecast Table
PeriodBest MLCI Upper (+1σ)CI Lower (-1σ)ARIMA(1,1,1)Exp. Smoothing
09 — Upload Your Own Dataset
Upload a CSV — Run the Full ML Pipeline on Your Data
📋 Example CSV Format
date,value
2015,42350.5
2016,45120.8
2017,48930.2
2018,51240.1
2019,49870.6
2020,38420.3
2021,55780.9
2022,61340.4
2023,63200.7
What Happens When You Upload?
📥
STEP 1 — Parse Your CSV
Your file is read in the browser (never sent anywhere). Dates and values are extracted. Header row auto-skipped if non-numeric.
⚙️
STEP 2 — Full Pipeline Runs on Your Data
The same 6-model ML pipeline runs: normalization → 9 feature engineering → 80/20 split → Linear Reg, Ridge, Random Forest, Gradient Boost, ARIMA, Exp. Smoothing all train on your data.
📊
STEP 3 — All 13 Sections Update
Every chart, table, metric, ACF, decomposition, feature importance, scatter plot, stats, confidence meters — all rebuild using your data. Forecasts show future periods beyond your last date.
🔮
STEP 4 — Get Future Predictions
The best model forecasts 6–15 future time periods beyond your last data point, with confidence intervals shown on the chart.
✓ WHAT YOU CAN UPLOAD
Sales figures · Temperature records · Stock prices · Website traffic · Any sensor reading · Revenue data · Energy consumption · Population data — anything with a date and a number per row.
10 — Descriptive Statistics & Data Quality
Full Dataset Statistics
StatisticValueInterpretation
Model Confidence Meters
11 — Predicted vs Actual Scatter Plot
Scatter — Predicted vs Actual (Best Model · Test Set)
Points on the diagonal line = perfect prediction · Spread = error magnitude
MAPE Comparison — All Models
■ Best model ■ <5% Excellent ■ 5–15% Good ■ >15% Poor
12 — Methodology & Model Reference
6 Models — How Each Works
📐Linear Regression
OLS regression on 9 engineered features (lags, rolling stats, trend). Fits a hyperplane minimizing sum of squared residuals. Fast, interpretable, best for linear trends.
ML
🔒Ridge Regression
Like linear regression but with L2 regularization (α=0.3) that penalizes large coefficients. Reduces overfitting when features are correlated. More stable than plain OLS.
ML
🌲Random Forest
Ensemble of 60 decision trees trained on bootstrap samples. Final prediction = mean of all trees. Captures non-linear patterns. Provides permutation-based feature importance.
ML · Ensemble
🚀Gradient Boosting
100 trees built sequentially, each correcting the previous tree's residuals. Learning rate 0.1, max depth 3. Often most accurate for tabular time series data.
ML · Boosting
📊ARIMA(1,1,1)
Autoregressive Integrated Moving Average. Differences the series once (d=1) for stationarity, uses 1 AR lag and 1 MA term. Classic time-series model, best for trend-dominated series.
ARIMA
📉Exp. Smoothing (Holt)
Double exponential smoothing with optimized α (level) and β (trend) parameters via grid search. Forecasts by extrapolating the estimated level and trend. Simple yet effective.
Time-Series
13 — Glossary & Metric Reference
Project Requirements — Full Checklist
✅ REAL DATA — NO FAKE DATA
All 7 datasets sourced from World Bank, FRED/BLS, Global Carbon Project, Macrotrends/Shiller, and EIA. Values cross-verified and corrected. Data reflects real historical events (2008 crash, COVID 2020, inflation spike 2022).
✅ REGRESSION & TIME-SERIES MODELS
Linear Regression (OLS), Ridge Regression (L2), Random Forest (ensemble), Gradient Boosting (sequential trees), ARIMA(1,1,1) (classic time-series), Holt Double Exponential Smoothing. All 6 run and are compared.
✅ CLEAN & PREPROCESS DATA
Z-score normalization, lag features (1,2,3,5), rolling mean/std (window 3 & 5), quadratic trend term, 80/20 temporal train/test split with no data leakage.
✅ EVALUATE MODEL ACCURACY
R², RMSE, MAE, MAPE computed for every model. Model leaderboard table, RMSE bar chart, R² bar chart, MAPE chart, residual analysis, scatter plot, error histogram, confidence meters.
✅ VISUALIZE PREDICTIONS
13 sections of charts: historical series, train/test split, multi-model forecast with CI bands, ARIMA detail, predicted vs actual, residuals, RMSE/R² bars, feature importance, ACF, decomposition (trend/seasonal/residual), scatter, MAPE, histogram.
✅ FORECAST FUTURE TRENDS
Models trained on data up to 2023, then auto-forecast 2024, 2025, 2026... (6–15 steps adjustable). Each future year's prediction is driven by the model's learned trend from real historical data. Confidence intervals shown.
How the Forecast Works — Does It Reflect Real Trends?
WHAT DRIVES THE 2024–2028 FORECASTS
The models are trained on all real data up to 2023. Then for each future year:

1. The last 6 known values are used as "rolling context"
2. 9 features are built: lag_1 (last year's value), lag_2, lag_3, lag_5, rolling means, rolling std, and trend index
3. The best ML model predicts the next value
4. That prediction becomes the new "lag_1" for the next step

So if inflation was rising, the model learned that pattern and continues it. If GDP was recovering, it extrapolates that recovery. The forecast is entirely driven by the real learned trend, not a formula.
WHAT THE FORECAST CAN AND CANNOT DO
✓ CAN: Extrapolate trend direction from real historical data · Show confidence ranges · Compare 3 different model forecasts · Update instantly when you change the horizon
✗ CANNOT: Know about events after 2023 (wars, elections, pandemics) · Guarantee accuracy beyond 1–2 periods · Replace expert economic forecasting
Key Terms Explained
R² (R-Squared)
Proportion of variance explained by the model. R²=1.0 is perfect; R²=0 means model is no better than predicting the mean. Can be negative for very poor models.
RMSE (Root Mean Squared Error)
Square root of the average squared residuals. In the same units as your data. Penalizes large errors more heavily. Lower is better.
MAE (Mean Absolute Error)
Average of absolute residuals. More robust to outliers than RMSE. Easier to interpret: "on average, predictions are off by X units."
MAPE (Mean Absolute Percentage Error)
Average percentage error. Good for comparing across datasets with different scales. Undefined when actual values are zero.
Confidence Interval (CI)
Range where the true value is expected to fall with a given probability. Here ±1σ ≈ 68% CI based on test set residual standard deviation.
Autocorrelation (ACF)
Correlation of a series with its own lagged values. High ACF at lag 1 suggests strong momentum; oscillating ACF suggests cyclical patterns.
Feature Importance
Permutation importance: how much model error increases when each feature is randomly shuffled. Larger = more important. Computed on training data using Random Forest.
Train/Test Split (80/20)
80% of data (chronologically earliest) trains the model; 20% (most recent) evaluates it on unseen data. Temporal split prevents data leakage.
Z-score Normalization
Each value is scaled to (value − mean) / std. Brings all features to the same scale, which is required for regression models and improves convergence.