LSTM experiments: from simple models to a Bayesian approach¶
Where did the idea come from?¶
For a long time, I have been building tools that help analyze financial markets. I am interested not only in how accurate the predictions are, but also in understanding the uncertainty that comes with every prediction. In this short project, my goal was to test sequential LSTM models and how useful they are for generating investment signals — and also to see what extra value a Bayesian approach can bring, since it lets us estimate how much we can trust the predicted values.
In other words: how much does the model trust its own forecast?
Why LSTM?¶
This model is known for its ability to process time-based data, especially for capturing patterns in price sequences. It is an architecture that has been used for years in time-series forecasting.
This project is part of a bigger effort — building my own personal analytics and investment platform, where different models, strategies, and data sources will work together to support investment decisions.
The goal is to answer this question: can predictive models, combined with uncertainty estimates, improve the quality of the signals they generate? This is also a space for my own experiments with the LSTM architecture.
Data description¶
For the experiments, I used market data downloaded with the yfinance library, covering daily price data for a chosen financial instrument. The data includes the standard market columns: Open, High, Low, Close, Volume.
Depending on the model variant being tested, I used different subsets of this data — from a single feature (for example, the opening price) to a set of several features describing market behavior.
At the start, I did not fully filter the data — I did not remove outliers (extreme values). In this first phase, I wanted to check how the models handle typical noise in the data.
In later variants, I added preprocessing (for example, normalizing the data, removing extreme values) to check how the quality of input data affects prediction accuracy.
The data usually covered several years of price history, for example 2020–2024.
Test variants¶
1. No preprocessing + only the opening price¶
Description: Using only the opening price, but with preprocessing added: normalization and a fixed sequence length (for example, 60 days).
Goal: check whether LSTM can capture basic patterns in raw data.
Results:
| Metric | Value | Conclusion |
|---|---|---|
| MAE | 1.2539 | High — the model is often wrong |
| RMSE | 1.6665 | High variability in errors |
| Coverage ±σ | none | no uncertainty estimate in this variant |
| Direction Accuracy | 92.42% | Very good — the model captures direction and price changes well |
| MAPE | 35.80% | Not acceptable |
Conclusions: The model trained only on the Open value achieved good direction accuracy (92.42%), but a large regression error (MAE = 1.25, MAPE = 35.8%). This means that, while the model can recognize the trend, it is not good enough to predict actual price levels without extra features and preprocessing.
2. Simple network with one feature¶
Description: Instead of removing extreme values (outliers) from the financial data, I used RobustScaler, which reduces their effect on the data distribution without losing potentially important information about market dynamics.
Goal: check how data preparation affects prediction quality.
Results:
| Metric | Value | Conclusion |
|---|---|---|
| MAE | 1.2438 | High — the model is often wrong |
| RMSE | 1.6822 | High variability in errors |
| Coverage ±σ | none | no uncertainty estimate in this variant |
| Direction Accuracy | 92.93% | Very good — the model captures direction and price changes well |
| MAPE | 35.37% | Not acceptable |
Comparison with the previous model:
| Variant | MAPE | MAE | RMSE | Direction Accuracy |
|---|---|---|---|---|
1. StandardScaler |
35.80% | 1.2539 | 1.6665 | 92.42% |
2. RobustScaler |
35.37% | 1.2438 | 1.6822 | 92.93% |
Conclusions: Using RobustScaler improved model stability: direction accuracy rose from 92.42% to 92.93%, and the mean percentage error dropped from 35.8% to 35.37%. The differences are small, but they confirm that robust scaling works better than a classic StandardScaler, especially when extreme values are present.
3. Larger network with several features¶
Description: Using several variables: Open, High, Low, Close, Volume.
Goal: give the model richer market context.
Results:
| Metric | Value | Conclusion |
|---|---|---|
| MAE | 0.4624 | High — the model is often wrong |
| RMSE | 0.7209 | High variability in errors |
| Coverage ±σ | none | no uncertainty estimate in this variant |
| Direction Accuracy | 98.48% | Very good — the model captures direction and price changes well |
| MAPE | 36.15% | Not acceptable |
Conclusions: Using several input features clearly improved prediction quality. MAE dropped to 0.46, and direction accuracy reached 98.48%. The percentage error (MAPE) stayed high, though — this shows how hard it is to match relative values, especially at low price levels. This model is good at detecting trend direction, and adding a Bayesian LSTM approach would also let us estimate how confident the forecasts are.
Variant with a deeper network architecture:
| Metric | Variant 1 (shallow) | Variant 2 (deep) | Conclusion |
|---|---|---|---|
| MAPE | 35.37% | 35.34% | no improvement in value accuracy |
| MAE | 1.2438 | 1.1055 | slight improvement — smaller errors |
| RMSE | 1.6822 | 1.5321 | more stable errors |
| Direction Acc. | 92.93% | 94.95% | noticeable improvement in direction |
Conclusions: Making the LSTM network deeper (four layers) raised direction accuracy from 92.93% to 94.95%, but it did not meaningfully improve value prediction accuracy (MAPE stayed around 35%). This suggests that improving results further needs better feature selection, not a more complex network.
4. Bayesian LSTM with uncertainty estimation¶
Description: Adding a mechanism to estimate prediction uncertainty using MC Dropout. Each prediction was made many times (for example, 100 times), and the results were analyzed statistically (mean and standard deviation).
Goal: get not only a predicted value, but also a measure of trust in that value (for example, coverage ±σ).
Results:
| Metric | Value | Comment |
|---|---|---|
| MAPE | 18.97% | very good result — below 20% = a realistically usable forecast |
| MAE | 0.0384 | extremely low — the model is off by about 4 cents on average (!), great calibration |
| RMSE | 0.0536 | low error variance — no extreme mistakes |
| Direction Accuracy | 96.46% | very high direction accuracy — ideal for trend-based strategies |
| Coverage ±σ (1σ) | 64.32% | close to ideal (~68%) — the prediction ŷ ± σ can be trusted |
Model and parameter details: Estimated training time (1 epoch)... Estimated training time (200 epochs): 9.38 s ≈ 0.16 min
Conclusions: Using the Bayesian LSTM approach with MC Dropout and a higher dropout rate gave very low regression errors (MAPE 18.97%, MAE 0.0384) and high direction accuracy (96.46%). Prediction coverage within ±σ was 64.32%, which shows a realistic estimate of the model's uncertainty.
Comparing the results¶
For each of the four variants, I evaluated forecast quality using both classic error metrics and metrics that account for prediction uncertainty. The goal was not only to maximize accuracy, but also to understand how different approaches affect the character and reliability of investment signals.
| Variant | MAPE (%) | MAE | RMSE | Direction Accuracy (%) | Coverage ±σ (%) | Total Score |
|---|---|---|---|---|---|---|
| 4. Bayesian + Dropout 0.4 | 18.97 | 0.0384 | 0.0536 | 96.46 | 64.32 | 0.266667 |
| 3. OHLCV + RobustScaler | 36.15 | 0.4624 | 0.7209 | 98.48 | nan | -0.186596 |
| 2. Open + RobustScaler | 35.37 | 1.2438 | 1.6822 | 92.93 | nan | -0.845683 |
| 1. Open + StandardScaler | 35.8 | 1.2539 | 1.6665 | 92.42 | nan | -0.889923 |
Metrics used in the comparison¶
MAE (Mean Absolute Error) — the average absolute error.
RMSE (Root Mean Squared Error) — the square root of the mean squared error.
Coverage ±σ — the percentage of cases where the real value fell inside the range: (prediction ± standard deviation).
Direction Accuracy — the percentage of cases where the model correctly predicted whether the price would go up or down.
MAPE (Mean Absolute Percentage Error) — the average percentage error, meaning by how many percent, on average, the model is wrong.
Observations¶
Adding preprocessing and more features steadily reduced prediction errors. The Bayesian model did not reach the lowest MAE values, but it was the only one that also gave uncertainty information, which matters a lot in practice. A Coverage ±σ value around 74% suggests that the model correctly calibrates its predicted distributions — this can be used to filter out "low-confidence" signals.
Final conclusions¶
Accuracy is not everything. In investing, understanding prediction uncertainty matters just as much. A model that correctly predicts a rise, but isn't confident about it, can be less useful than one that predicts smaller, steady changes — but with high confidence. Bayesian LSTM models change the rules of the game. They let you not only classify or predict values, but also judge how much to trust each forecast. This can become the foundation of an investment system that dynamically weighs signals based on their uncertainty (ŷ ± σ).
Andrzej