Publications

Research on multi-agent AI systems, model selection, and machine learning under distribution shift.

2026 · Independent Research · CC BY 4.0
Weighted Multi-Expert Synthesis for High-Stakes Decision Support: A Multi-Agent LLM Framework with Dissent Preservation
David Liu
We present Meta Council, a multi-agent LLM framework in which N expert agents—each with a unique professional persona and analytical framework—analyze queries in parallel, then a weighted synthesis step produces structured decision documents with confidence scores, dissent preservation, and risk matrices. Evaluated across 750+ benchmark runs spanning six domains and five models (3B to frontier-class), we find that weighted synthesis outperforms single-best selection by 29–58% on freetext tasks (p<0.0001, d=2.16), that synthesis amplifies model quality non-linearly, and that the optimal aggregation method is domain-dependent.
Key Results: Synthesis outperforms single-best by 29–58% (p<0.0001) · 80% categorical accuracy vs 50% for single-best · Synthesis amplification: 2.99x for mid-tier models · Domain-dependent: synthesis wins in business (100%), single-best wins in legal (75%)
Multi-Agent Systems LLM Decision Support Dissent Preservation Weighted Synthesis Confidence Calibration
2026 · Independent Research · CC BY 4.0
Stability Bonus Regularization for Model Selection Under Positive-Class Distribution Shift
David C. Liu
When positive-class training data is a biased subset of the true positive population—common in hiring, medical screening, and credit scoring—standard cross-validation selects models that overfit to the observed cluster. We propose Stability Bonus (SB) regularization, which rewards hyperparameter configurations with small train–validation gaps, favoring wider decision boundaries that generalize to unseen positive subgroups. On hard synthetic benchmarks with controlled distribution shift, SB improves test AUC by +6.9% (p<0.0001, d=3.48) and unseen-subgroup AUC by +7.1%. Class weighting, the standard remedy, is the worst performer under shift.
Key Results: +6.9% test AUC on hard benchmarks (p<0.0001) · +7.1% on unseen positive subgroups · Class weighting is worst under distribution shift · SB selects more regularized models with wider decision boundaries · Does NOT help when signal is strong (honest negative result)
Model Selection Distribution Shift Cross-Validation Class Imbalance Regularization Fairness