AI-Powered Stock Forecasting Model in Vietnam

The goal of our quantitative model is to predict short-to-medium-term stock price direction and risk-adjusted return potential. We combine top-down macro views with bottom-up stock selection, enhanced by data-driven and AI-augmented insights.

Input Features & Data Modalities

Data LayerDescriptionData FrequencyNotes
Macro/Market-wideGDP growth, inflation, interest rates, PMIs, VN-Index trendsMonthly/QuarterlyAdjusted for lag effects; sourced from SBV, GSO, IMF
Company FundamentalsEarnings growth, ROE, leverage, margins, cash flowsQuarterlyNormalized across sectors; sourced from HOSE/HNX filings
Public SentimentNews headlines, earnings calls, social media chatterDailyVietnamese NLP required; news APIs + web scraping
Technical AnalysisRSI, MACD, MA crossovers, Bollinger Bands, volumeDailyExtracted from OHLCV data using TA-Lib/Python

Integrating Machine Learning and AI into the Quantitative Model

We employ a multi-modal machine learning pipeline with separate feature streams and a fusion model.

Key Models Used:

TaskModelPurpose
Macro & FundamentalsGradient Boosted Trees (e.g., XGBoost)Handles tabular structured data; captures nonlinearities
Sentiment SignalsLSTM or Transformer-based NLP models (e.g., PhoBERT for Vietnamese)Extracts sentiment trend; detects tone shifts
Technical IndicatorsRandom Forest or CNN for pattern recognitionIdentifies chart patterns and high-frequency signals
Fusion LayerMeta Learner (e.g., LightGBM or stacking ensemble)Aggregates predictions from all submodels

AI Enhancements:

  • Feature importance with SHAP for interpretability
  • Transfer learning to adapt global models to Vietnamese context
  • Bayesian optimization for hyperparameter tuning

Sentiment Analysis

Sources Used:

  • Local News Sites: Cafef.vn, Vietstock.vn, VnExpress (finance section)
  • Company Reports & Earnings Calls: PDF scrapers + transcript parsing
  • Social Media: Facebook investor groups, forums (e.g., F319), Twitter (Vietnamese investors)

Preprocessing:

  • Clean, tokenize, and normalize Vietnamese text
  • Apply custom Vietnamese sentiment lexicon + ML-based classifiers
  • Score sentiment from −1 (very negative) to +1 (very positive)

Data Quality Controls:

  • Deduplication of news to avoid echo bias
  • Relevance filtering via keyword tagging (company name, sector context)
  • Outlier removal for abnormal sentiment spikes with no news basis

Model Validation Frameworks

  1. Time Series Cross-Validation: Rolling window backtests to avoid lookahead bias
  2. Walk-forward analysis: Retrain and retest across moving windows
  3. Out-of-sample testing: Hold out 12–24 months for real-world testing

Performance Metrics:

TypeMetricPurpose
Prediction AccuracyRMSE, MAE, Directional Accuracy (%)Gauges forecast precision and directionality
Portfolio ReturnCAGR, Sharpe Ratio, Max DrawdownMeasures trading effectiveness of the signals
Risk ControlHit Rate (Win/Loss ratio), Value at Risk (VaR)Helps refine sizing and timing of trades
ExplainabilitySHAP values, correlation heatmapsEnsures trust and transparency in outputs

Key Success Factors Observed:

  1. Macro factors worked best for trend filters, especially in illiquid mid-caps with beta to market shifts.
  2. Sentiment was highly predictive around earnings and regulatory events, but needed heavy local language filtering.
  3. Company fundamentals helped with value/momentum pair selection — especially ROE growth and net cash positions.
  4. ML model ensembles consistently outperformed single-model approaches, especially when blending macro, sentiment, and technical layers.
  5. Models were most effective when embedded into the investment process, not treated as black-box tools — i.e., they influenced position sizing, stop-loss logic, and entry/exit timing.

Factoring in Lower Liquidity, Higher Volatility, Limited Data Availability, Information Asymmetry, and Regulatory Changes in the Vietnam Market

The key issues with the Vietnam’s market structure are price slippage on execution, susceptibility to large trades and rumors, and difficulty in implementing signal-based strategies at scale.

Model Adjustments:

ChallengeMitigation Strategy
Low LiquidityAdd liquidity filters: average daily volume (ADV) thresholds, bid-ask spread constraints, or turnover ratio minimums
Volatility SpikesUse GARCH or EWMA volatility models to adjust signal confidence or position sizing
Signal NoiseSmooth signals with moving averages, Kalman filters, or by requiring multi-factor confirmation before taking a position
Execution FrictionIntegrate transaction cost models (slippage, spread, impact) into alpha calculation to get net alpha

Example: Our model would reduce or eliminate signals in stocks where volume falls below 30-day ADV of 2B VND or where spreads are wider than 1.5% of mid-price.

The Vietnam market specifics include:

  • Retail-driven behavior → price moves on rumors/news, not just fundamentals
  • Regulatory shocks (e.g., foreign ownership limits, sudden policy changes)
  • Inconsistent disclosure practices

Tactical Adjustments:

IssueTechnique
Information AsymmetryIncorporate proxy variables: e.g., insider trading activity, volume surges, unexplained price gaps as “smart money” signals
Regulatory RiskUse event flags in the model: e.g., MOF/NHNN policy news or sector-specific changes trigger regime shifts
Retail HerdingModel herding behavior using correlation of trades across retail favorite names (e.g., FLC, HAG group historically)
Opaque Earnings TimingUse NLP models to scan for early media leaks or statements from company insiders in forums and news comments

Example: After sudden capital control announcements, our models that adjusted weights away from foreign-heavy stocks (e.g., VNM, MWG) outperformed due to ownership squeeze dynamics.

Compensating for Limited or Noisy Data

The common data limitations in Vietnam include:

  1. Inconsistent sentiment sources
  2. Sparse earnings data for smaller caps
  3. Delayed or infrequent disclosures
  4. Lack of structured datasets in Vietnamese language

Our Solutions:

ProblemSolution
Sentiment GapsAugment Vietnamese sentiment signals using:
→ Transfer learning from global models (fine-tuned on local language)
→ Custom Vietnamese sentiment lexicons built with domain tagging (finance-specific terms)
Missing FundamentalsImpute missing values using sector medians or rolling averages; fill data gaps with regression-based estimation
Sparse HistoryUse panel data pooling across similar firms/sectors to train robust factors (especially helpful in financials, industrials)
No Structured Analyst ForecastsBuild internal “soft consensus” models using scraped media statements and sentiment-weighted company guidance
Unstructured DataApply OCR + NLP to PDF earnings calls, investor presentations, and even TV interviews transcribed by voice-to-text

Example: For many mid-caps (e.g., KBC, TCH), sentiment data was sparse, so our models weigh macro and technicals more heavily and rely on volume/surge detection as a proxy for investor attention.

Techniques to Avoid Overfitting to Noisy EM Data:

  1. Regularization (L1/L2) to penalize over-complexity
  2. Feature selection with domain knowledge (avoid overloading models with correlated or spurious variables)
  3. Rolling window retraining to reflect changing market regimes
  4. Stress-testing under market shocks (e.g., COVID selloff, 2022 liquidity crunch)
  5. Model ensembling: blending rule-based systems with AI/ML forecasts

Portfolio & Execution Adjustments

To account for structural risks, our models incorporated:

  1. Liquidity-adjusted position sizing (smaller weights in thinly traded names)
  2. Volatility-weighted exposure (reduce size during volatile periods)
  3. Blacklist or watchlist filters: exclude stocks under regulatory review or known manipulation zones

Prescriptive Models: Beyond Forecasting/Generating Price Targets

Instead of just forecasting that a stock price might go from VND 20,000 to VND 22,000, the prescriptive models answers:

“Should I buy this stock, in what size, with what risk management, and when should I exit?”

Prescriptive Models Provide:

  1. Actionable signals (Buy/Sell/Hold)
  2. Conviction levels (low/high confidence)
  3. Position sizing rules
  4. Timing recommendations (entry/exit windows)
  5. Scenario-based responses (e.g., “if interest rates spike, rotate into X sector”)

Our prescriptive models support actionable decision-making:

Decision AreaHow Model Contributed
Portfolio ConstructionUsed forecasted risk-adjusted returns to tilt weights (e.g., overweight digital consumption stocks like FPT or MWG during COVID)
Trade ExecutionPaired sentiment surges with technical breakouts to recommend intraday or T+3 entry points
Risk ManagementFlagged rising macro uncertainty → reduced exposure in beta-sensitive names (e.g., real estate) or rotated into cash-like instruments
Earnings StrategyModels lowered exposure in stocks with negative earnings momentum and sentiment divergence ahead of earnings (e.g., VNM when volume dropped ahead of soft report)

To make the model more decision-oriented:

EnhancementPurpose
Confidence Scoring on PredictionsHelps investors size positions or delay trades based on signal strength
Explainability Layer (e.g., SHAP)Shows why a stock is rated Buy/Sell → improves trust & clarity
Scenario Simulation Tools“What happens if VND weakens 5%?” or “What if earnings miss by 10%?”
Strategy Backtester InterfaceLet PMs test new allocation rules based on model signals (e.g., factor tilts, sector rotations)
Dynamic Alerts/TriggersAlert PMs when model detects high-impact changes (e.g., earnings guidance + sentiment reversal)