Generative AI/Language Model

[Language Model] KR-BERT

데이터 세상 2021. 4. 13. 11:23

KR-BERT

A Small-Scale Korean-Specific Language Model

Git Hub

github.com/snunlp/KR-BERT

 

snunlp/KR-BERT

KoRean based BERT pre-trained models (KR-BERT) for Tensorflow and PyTorch - snunlp/KR-BERT

github.com

 

  Mulitlingual BERT(Google) KorBERT(ETRI) KoBERT(SKT) KR-BERT character KR-BERT sub-character
vocab size 119,547 30,797 8,002 16,424 12,367
parameter size 167,356,416 109,973,391 92,186,880 99,265,066 96,145,233
data size -
(The Wikipedia data
for 104 languages)
23GB
4.7B morphemes
-
(25M sentences,
233M words)
2.47GB
20M sentences,
233M words
2.47GB
20M sentences,
233M words

 

반응형