Name: Swallow 70B
Author: Tokyo Institute of Technology

Tokyo Institute of Technology's Japanese-enhanced Llama model with 70B parameters. Based on Llama 3.3, continually pre-trained on ~200B Japanese tokens from Swallow Corpus v2, Japanese/English Wikipedia, and math/code content. Features expanded vocabulary with Japanese characters and subwords for efficient tokenization and notably faster inference. Evaluated on 10 Japanese benchmarks (JCommonsenseQA, JEMHopQA, NIILC, JSQuAD) and 10 English benchmarks (OpenBookQA, TriviaQA, SQuAD 2.0, XWINO, HellaSwag). Achieves best 70B-class performance for Japanese as of Dec 2023. Also available in Llama 3.1 variants. Represents state-of-the-art Japanese language AI.

Swallow 70B

Strengths

Caveats

Capabilities

Resources

Reviews

Comments