QVAC Blog

Filter By:
QVAC Genesis II: Expanding the Largest and Highest-Quality Multi-domain Educational Synthetic Dataset for LLM Pre-training

Building upon the success of Genesis I, we introduce QVAC Genesis II, a major expansion that adds new domains and a total of 148 billion tokens.

Read more
Introducing QVAC Genesis I: the Largest and Highest-Quality Multi-domain Educational Synthetic Dataset for Pre-training

There is a need for publicly available, large-scale synthetic datasets that are rigorously curated. Genesis I is our first effort in this direction.

Read more
Loading...