The goal of our quantitative model is to predict short-to-medium-term stock price direction and risk-adjusted return potential. We combine top-down macro views with bottom-up stock selection, enhanced by data-driven and AI-augmented insights.
Input Features & Data Modalities
| Data Layer | Description | Data Frequency | Notes |
| Macro/Market-wide | GDP growth, inflation, interest rates, PMIs, VN-Index trends | Monthly/Quarterly | Adjusted for lag effects; sourced from SBV, GSO, IMF |
| Company Fundamentals | Earnings growth, ROE, leverage, margins, cash flows | Quarterly | Normalized across sectors; sourced from HOSE/HNX filings |
| Public Sentiment | News headlines, earnings calls, social media chatter | Daily | Vietnamese NLP required; news APIs + web scraping |
| Technical Analysis | RSI, MACD, MA crossovers, Bollinger Bands, volume | Daily | Extracted from OHLCV data using TA-Lib/Python |
Integrating Machine Learning and AI into the Quantitative Model
We employ a multi-modal machine learning pipeline with separate feature streams and a fusion model.
Key Models Used:
| Task | Model | Purpose |
| Macro & Fundamentals | Gradient Boosted Trees (e.g., XGBoost) | Handles tabular structured data; captures nonlinearities |
| Sentiment Signals | LSTM or Transformer-based NLP models (e.g., PhoBERT for Vietnamese) | Extracts sentiment trend; detects tone shifts |
| Technical Indicators | Random Forest or CNN for pattern recognition | Identifies chart patterns and high-frequency signals |
| Fusion Layer | Meta Learner (e.g., LightGBM or stacking ensemble) | Aggregates predictions from all submodels |
AI Enhancements:
- Feature importance with SHAP for interpretability
- Transfer learning to adapt global models to Vietnamese context
- Bayesian optimization for hyperparameter tuning
Sentiment Analysis
Sources Used:
- Local News Sites: Cafef.vn, Vietstock.vn, VnExpress (finance section)
- Company Reports & Earnings Calls: PDF scrapers + transcript parsing
- Social Media: Facebook investor groups, forums (e.g., F319), Twitter (Vietnamese investors)
Preprocessing:
- Clean, tokenize, and normalize Vietnamese text
- Apply custom Vietnamese sentiment lexicon + ML-based classifiers
- Score sentiment from −1 (very negative) to +1 (very positive)
Data Quality Controls:
- Deduplication of news to avoid echo bias
- Relevance filtering via keyword tagging (company name, sector context)
- Outlier removal for abnormal sentiment spikes with no news basis
Model Validation Frameworks
- Time Series Cross-Validation: Rolling window backtests to avoid lookahead bias
- Walk-forward analysis: Retrain and retest across moving windows
- Out-of-sample testing: Hold out 12–24 months for real-world testing
Performance Metrics:
| Type | Metric | Purpose |
| Prediction Accuracy | RMSE, MAE, Directional Accuracy (%) | Gauges forecast precision and directionality |
| Portfolio Return | CAGR, Sharpe Ratio, Max Drawdown | Measures trading effectiveness of the signals |
| Risk Control | Hit Rate (Win/Loss ratio), Value at Risk (VaR) | Helps refine sizing and timing of trades |
| Explainability | SHAP values, correlation heatmaps | Ensures trust and transparency in outputs |
Key Success Factors Observed:
- Macro factors worked best for trend filters, especially in illiquid mid-caps with beta to market shifts.
- Sentiment was highly predictive around earnings and regulatory events, but needed heavy local language filtering.
- Company fundamentals helped with value/momentum pair selection — especially ROE growth and net cash positions.
- ML model ensembles consistently outperformed single-model approaches, especially when blending macro, sentiment, and technical layers.
- Models were most effective when embedded into the investment process, not treated as black-box tools — i.e., they influenced position sizing, stop-loss logic, and entry/exit timing.
Factoring in Lower Liquidity, Higher Volatility, Limited Data Availability, Information Asymmetry, and Regulatory Changes in the Vietnam Market
The key issues with the Vietnam’s market structure are price slippage on execution, susceptibility to large trades and rumors, and difficulty in implementing signal-based strategies at scale.
Model Adjustments:
| Challenge | Mitigation Strategy |
| Low Liquidity | Add liquidity filters: average daily volume (ADV) thresholds, bid-ask spread constraints, or turnover ratio minimums |
| Volatility Spikes | Use GARCH or EWMA volatility models to adjust signal confidence or position sizing |
| Signal Noise | Smooth signals with moving averages, Kalman filters, or by requiring multi-factor confirmation before taking a position |
| Execution Friction | Integrate transaction cost models (slippage, spread, impact) into alpha calculation to get net alpha |
Example: Our model would reduce or eliminate signals in stocks where volume falls below 30-day ADV of 2B VND or where spreads are wider than 1.5% of mid-price.
The Vietnam market specifics include:
- Retail-driven behavior → price moves on rumors/news, not just fundamentals
- Regulatory shocks (e.g., foreign ownership limits, sudden policy changes)
- Inconsistent disclosure practices
Tactical Adjustments:
| Issue | Technique |
| Information Asymmetry | Incorporate proxy variables: e.g., insider trading activity, volume surges, unexplained price gaps as “smart money” signals |
| Regulatory Risk | Use event flags in the model: e.g., MOF/NHNN policy news or sector-specific changes trigger regime shifts |
| Retail Herding | Model herding behavior using correlation of trades across retail favorite names (e.g., FLC, HAG group historically) |
| Opaque Earnings Timing | Use NLP models to scan for early media leaks or statements from company insiders in forums and news comments |
Example: After sudden capital control announcements, our models that adjusted weights away from foreign-heavy stocks (e.g., VNM, MWG) outperformed due to ownership squeeze dynamics.
Compensating for Limited or Noisy Data
The common data limitations in Vietnam include:
- Inconsistent sentiment sources
- Sparse earnings data for smaller caps
- Delayed or infrequent disclosures
- Lack of structured datasets in Vietnamese language
Our Solutions:
| Problem | Solution |
| Sentiment Gaps | Augment Vietnamese sentiment signals using: |
| → Transfer learning from global models (fine-tuned on local language) | |
| → Custom Vietnamese sentiment lexicons built with domain tagging (finance-specific terms) | |
| Missing Fundamentals | Impute missing values using sector medians or rolling averages; fill data gaps with regression-based estimation |
| Sparse History | Use panel data pooling across similar firms/sectors to train robust factors (especially helpful in financials, industrials) |
| No Structured Analyst Forecasts | Build internal “soft consensus” models using scraped media statements and sentiment-weighted company guidance |
| Unstructured Data | Apply OCR + NLP to PDF earnings calls, investor presentations, and even TV interviews transcribed by voice-to-text |
Example: For many mid-caps (e.g., KBC, TCH), sentiment data was sparse, so our models weigh macro and technicals more heavily and rely on volume/surge detection as a proxy for investor attention.
Techniques to Avoid Overfitting to Noisy EM Data:
- Regularization (L1/L2) to penalize over-complexity
- Feature selection with domain knowledge (avoid overloading models with correlated or spurious variables)
- Rolling window retraining to reflect changing market regimes
- Stress-testing under market shocks (e.g., COVID selloff, 2022 liquidity crunch)
- Model ensembling: blending rule-based systems with AI/ML forecasts
Portfolio & Execution Adjustments
To account for structural risks, our models incorporated:
- Liquidity-adjusted position sizing (smaller weights in thinly traded names)
- Volatility-weighted exposure (reduce size during volatile periods)
- Blacklist or watchlist filters: exclude stocks under regulatory review or known manipulation zones
Prescriptive Models: Beyond Forecasting/Generating Price Targets
Instead of just forecasting that a stock price might go from VND 20,000 to VND 22,000, the prescriptive models answers:
“Should I buy this stock, in what size, with what risk management, and when should I exit?”
Prescriptive Models Provide:
- Actionable signals (Buy/Sell/Hold)
- Conviction levels (low/high confidence)
- Position sizing rules
- Timing recommendations (entry/exit windows)
- Scenario-based responses (e.g., “if interest rates spike, rotate into X sector”)
Our prescriptive models support actionable decision-making:
| Decision Area | How Model Contributed |
| Portfolio Construction | Used forecasted risk-adjusted returns to tilt weights (e.g., overweight digital consumption stocks like FPT or MWG during COVID) |
| Trade Execution | Paired sentiment surges with technical breakouts to recommend intraday or T+3 entry points |
| Risk Management | Flagged rising macro uncertainty → reduced exposure in beta-sensitive names (e.g., real estate) or rotated into cash-like instruments |
| Earnings Strategy | Models lowered exposure in stocks with negative earnings momentum and sentiment divergence ahead of earnings (e.g., VNM when volume dropped ahead of soft report) |
To make the model more decision-oriented:
| Enhancement | Purpose |
| Confidence Scoring on Predictions | Helps investors size positions or delay trades based on signal strength |
| Explainability Layer (e.g., SHAP) | Shows why a stock is rated Buy/Sell → improves trust & clarity |
| Scenario Simulation Tools | “What happens if VND weakens 5%?” or “What if earnings miss by 10%?” |
| Strategy Backtester Interface | Let PMs test new allocation rules based on model signals (e.g., factor tilts, sector rotations) |
| Dynamic Alerts/Triggers | Alert PMs when model detects high-impact changes (e.g., earnings guidance + sentiment reversal) |
