AI-Powered Stock Forecasting Model in Vietnam

The goal of our quantitative model is to predict short-to-medium-term stock price direction and risk-adjusted return potential. We combine top-down macro views with bottom-up stock selection, enhanced by data-driven and AI-augmented insights.

Input Features & Data Modalities

Data Layer	Description	Data Frequency	Notes
Macro/Market-wide	GDP growth, inflation, interest rates, PMIs, VN-Index trends	Monthly/Quarterly	Adjusted for lag effects; sourced from SBV, GSO, IMF
Company Fundamentals	Earnings growth, ROE, leverage, margins, cash flows	Quarterly	Normalized across sectors; sourced from HOSE/HNX filings
Public Sentiment	News headlines, earnings calls, social media chatter	Daily	Vietnamese NLP required; news APIs + web scraping
Technical Analysis	RSI, MACD, MA crossovers, Bollinger Bands, volume	Daily	Extracted from OHLCV data using TA-Lib/Python

Integrating Machine Learning and AI into the Quantitative Model

We employ a multi-modal machine learning pipeline with separate feature streams and a fusion model.

Key Models Used:

Task	Model	Purpose
Macro & Fundamentals	Gradient Boosted Trees (e.g., XGBoost)	Handles tabular structured data; captures nonlinearities
Sentiment Signals	LSTM or Transformer-based NLP models (e.g., PhoBERT for Vietnamese)	Extracts sentiment trend; detects tone shifts
Technical Indicators	Random Forest or CNN for pattern recognition	Identifies chart patterns and high-frequency signals
Fusion Layer	Meta Learner (e.g., LightGBM or stacking ensemble)	Aggregates predictions from all submodels

AI Enhancements:

Feature importance with SHAP for interpretability
Transfer learning to adapt global models to Vietnamese context
Bayesian optimization for hyperparameter tuning

Sentiment Analysis

Sources Used:

Local News Sites: Cafef.vn, Vietstock.vn, VnExpress (finance section)
Company Reports & Earnings Calls: PDF scrapers + transcript parsing
Social Media: Facebook investor groups, forums (e.g., F319), Twitter (Vietnamese investors)

Preprocessing:

Clean, tokenize, and normalize Vietnamese text
Apply custom Vietnamese sentiment lexicon + ML-based classifiers
Score sentiment from −1 (very negative) to +1 (very positive)

Data Quality Controls:

Deduplication of news to avoid echo bias
Relevance filtering via keyword tagging (company name, sector context)
Outlier removal for abnormal sentiment spikes with no news basis

Model Validation Frameworks

Time Series Cross-Validation: Rolling window backtests to avoid lookahead bias
Walk-forward analysis: Retrain and retest across moving windows
Out-of-sample testing: Hold out 12–24 months for real-world testing

Performance Metrics:

Type	Metric	Purpose
Prediction Accuracy	RMSE, MAE, Directional Accuracy (%)	Gauges forecast precision and directionality
Portfolio Return	CAGR, Sharpe Ratio, Max Drawdown	Measures trading effectiveness of the signals
Risk Control	Hit Rate (Win/Loss ratio), Value at Risk (VaR)	Helps refine sizing and timing of trades
Explainability	SHAP values, correlation heatmaps	Ensures trust and transparency in outputs

Key Success Factors Observed:

Macro factors worked best for trend filters, especially in illiquid mid-caps with beta to market shifts.
Sentiment was highly predictive around earnings and regulatory events, but needed heavy local language filtering.
Company fundamentals helped with value/momentum pair selection — especially ROE growth and net cash positions.
ML model ensembles consistently outperformed single-model approaches, especially when blending macro, sentiment, and technical layers.
Models were most effective when embedded into the investment process, not treated as black-box tools — i.e., they influenced position sizing, stop-loss logic, and entry/exit timing.

Factoring in Lower Liquidity, Higher Volatility, Limited Data Availability, Information Asymmetry, and Regulatory Changes in the Vietnam Market

The key issues with the Vietnam’s market structure are price slippage on execution, susceptibility to large trades and rumors, and difficulty in implementing signal-based strategies at scale.

Model Adjustments:

Challenge	Mitigation Strategy
Low Liquidity	Add liquidity filters: average daily volume (ADV) thresholds, bid-ask spread constraints, or turnover ratio minimums
Volatility Spikes	Use GARCH or EWMA volatility models to adjust signal confidence or position sizing
Signal Noise	Smooth signals with moving averages, Kalman filters, or by requiring multi-factor confirmation before taking a position
Execution Friction	Integrate transaction cost models (slippage, spread, impact) into alpha calculation to get net alpha

Example: Our model would reduce or eliminate signals in stocks where volume falls below 30-day ADV of 2B VND or where spreads are wider than 1.5% of mid-price.

The Vietnam market specifics include:

Retail-driven behavior → price moves on rumors/news, not just fundamentals
Regulatory shocks (e.g., foreign ownership limits, sudden policy changes)
Inconsistent disclosure practices

Tactical Adjustments:

Issue	Technique
Information Asymmetry	Incorporate proxy variables: e.g., insider trading activity, volume surges, unexplained price gaps as “smart money” signals
Regulatory Risk	Use event flags in the model: e.g., MOF/NHNN policy news or sector-specific changes trigger regime shifts
Retail Herding	Model herding behavior using correlation of trades across retail favorite names (e.g., FLC, HAG group historically)
Opaque Earnings Timing	Use NLP models to scan for early media leaks or statements from company insiders in forums and news comments

Example: After sudden capital control announcements, our models that adjusted weights away from foreign-heavy stocks (e.g., VNM, MWG) outperformed due to ownership squeeze dynamics.

Compensating for Limited or Noisy Data

The common data limitations in Vietnam include:

Inconsistent sentiment sources
Sparse earnings data for smaller caps
Delayed or infrequent disclosures
Lack of structured datasets in Vietnamese language

Our Solutions:

Problem	Solution
Sentiment Gaps	Augment Vietnamese sentiment signals using:
→ Transfer learning from global models (fine-tuned on local language)
→ Custom Vietnamese sentiment lexicons built with domain tagging (finance-specific terms)
Missing Fundamentals	Impute missing values using sector medians or rolling averages; fill data gaps with regression-based estimation
Sparse History	Use panel data pooling across similar firms/sectors to train robust factors (especially helpful in financials, industrials)
No Structured Analyst Forecasts	Build internal “soft consensus” models using scraped media statements and sentiment-weighted company guidance
Unstructured Data	Apply OCR + NLP to PDF earnings calls, investor presentations, and even TV interviews transcribed by voice-to-text

Example: For many mid-caps (e.g., KBC, TCH), sentiment data was sparse, so our models weigh macro and technicals more heavily and rely on volume/surge detection as a proxy for investor attention.

Techniques to Avoid Overfitting to Noisy EM Data:

Regularization (L1/L2) to penalize over-complexity
Feature selection with domain knowledge (avoid overloading models with correlated or spurious variables)
Rolling window retraining to reflect changing market regimes
Stress-testing under market shocks (e.g., COVID selloff, 2022 liquidity crunch)
Model ensembling: blending rule-based systems with AI/ML forecasts

Portfolio & Execution Adjustments

To account for structural risks, our models incorporated:

Liquidity-adjusted position sizing (smaller weights in thinly traded names)
Volatility-weighted exposure (reduce size during volatile periods)
Blacklist or watchlist filters: exclude stocks under regulatory review or known manipulation zones

Prescriptive Models: Beyond Forecasting/Generating Price Targets

Instead of just forecasting that a stock price might go from VND 20,000 to VND 22,000, the prescriptive models answers:

“Should I buy this stock, in what size, with what risk management, and when should I exit?”

Prescriptive Models Provide:

Actionable signals (Buy/Sell/Hold)
Conviction levels (low/high confidence)
Position sizing rules
Timing recommendations (entry/exit windows)
Scenario-based responses (e.g., “if interest rates spike, rotate into X sector”)

Our prescriptive models support actionable decision-making:

Decision Area	How Model Contributed
Portfolio Construction	Used forecasted risk-adjusted returns to tilt weights (e.g., overweight digital consumption stocks like FPT or MWG during COVID)
Trade Execution	Paired sentiment surges with technical breakouts to recommend intraday or T+3 entry points
Risk Management	Flagged rising macro uncertainty → reduced exposure in beta-sensitive names (e.g., real estate) or rotated into cash-like instruments
Earnings Strategy	Models lowered exposure in stocks with negative earnings momentum and sentiment divergence ahead of earnings (e.g., VNM when volume dropped ahead of soft report)

To make the model more decision-oriented:

Enhancement	Purpose
Confidence Scoring on Predictions	Helps investors size positions or delay trades based on signal strength
Explainability Layer (e.g., SHAP)	Shows why a stock is rated Buy/Sell → improves trust & clarity
Scenario Simulation Tools	“What happens if VND weakens 5%?” or “What if earnings miss by 10%?”
Strategy Backtester Interface	Let PMs test new allocation rules based on model signals (e.g., factor tilts, sector rotations)
Dynamic Alerts/Triggers	Alert PMs when model detects high-impact changes (e.g., earnings guidance + sentiment reversal)

Related

Discover more from Fundopedia