BLOOMZ & mT0
https://huggingface.co/bigscience/mt0-xxl
논문: https://arxiv.org/abs/2211.01786
a family of models capable of following human instructions in dozens of languages zero-shot
crosslingual task mixture (xP3)에서 BLOOM 및 mT5 사전 훈련된 다국어 언어 모델을 fine-tuning하고 보이지 않는 작업 및 언어에 대한 교차 언어 일반화가 가능한 결과 모델
Datasets
pretraining: mc4
https://huggingface.co/datasets/mc4
108 languages including Korean
finetuning: xP3
https://huggingface.co/datasets/bigscience/xP3
Crosslingual Public Pool of Prompts
46개 언어 및 16개 NLP 작업에 대한 프롬프트 및 데이터 세트 모음
수십 가지 언어로 zero-shot 사람의 지시를 따를 수 있는 다국어 언어 모델인 BLOOMZ 및 mT0의 교육에 사용
Name | Explanation | Example models |
xP3 | Mixture of 13 training tasks in 46 languages with English prompts without Korean Korea - language code: ko, country code: kr programming_language: - C - C++ - C# - Go - Java - JavaScript - Lua - PHP - Python - Ruby - Rust - Scala - TypeScript |
bloomz & mt0-xxl |
xP3x | Mixture of 17 tasks in 277 languages(including Korean) with English prompts Korean - Code: kor_Hang - Kilobytes: 4,642,468 - %: 0.68 - Samples: 3,415,920 - %: 0.64 |
WIP - Join us at Project Aya @C4AI to help! |
xP3mt | Mixture of 13 training tasks in 46 languages with English prompts | bloomz-mt & mt0-xxl-mt |
xP3all | Mixture of 13 training tasks in 46 languages with prompts in 20 languages (machine-translated from English) | |
xP3megds | xP3 + evaluation datasets adding an additional 3 tasks for a total of 16 tasks in 46 languages with English prompts | bloomz |
P3 | Repreprocessed version of the English-only P3 with 8 training tasks | bloomz-p3&mt0-xxl-p3 |
Architecture
Same as mt5-xxl (mT5-xxl)
mT5 (Multilingual T5)
https://github.com/google-research/multilingual-t5
Language (101 languages)
Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu
- mT5-Small (300 million parameters): gs://t5-data/pretrained_models/mt5/small
- mT5-Base (580 million parameters): gs://t5-data/pretrained_models/mt5/base
- mT5-Large (1.2 billion parameters): gs://t5-data/pretrained_models/mt5/large
- mT5-XL (3.7 billion parameters): gs://t5-data/pretrained_models/mt5/xl
- mT5-XXL (13 billion parameters): gs://t5-data/pretrained_models/mt5/xxl
BLOOMZ & mT0 Model Family
Multitask finetuned on xP3. Recommended for prompting in English. | |||||||||||
Parameters | 300M | 580M | 1.2B | 3.7B | 13B | 560M | 1.1B | 1.7B | 3B | 7.1B | 176B |
Finetuned Model | mt0-small | mt0-base | mt0-large | mt0-xl | mt0-xxl | bloomz-560m | bloomz-1b1 | bloomz-1b7 | bloomz-3b | bloomz-7b1 | bloomz |
Multitask finetuned on xP3mt. Recommended for prompting in non-English. | |||||||||||
Finetuned Model | mt0-xxl-mt | bloomz-7b1-mt | bloomz-mt | ||||||||
Multitask finetuned on P3. Released for research purposes only. Strictly inferior to above models | |||||||||||
Finetuned Model | mt0-xxl-p3 | bloomz-7b1-p3 | bloomz-p3 | ||||||||
Original pretrained checkpoints. Not recommended. | |||||||||||
Pretrained Model | mt5-small | mt5-base | mt5-large | mt5-xl | mt5-xxl | bloom-560m | bloom-1b1 | bloom-1b7 | bloom-3b | bloom-7b1 | bloom |
Limitations
프롬프트 엔지니어링:
성능은 프롬프트에 따라 다를 수 있습니다.
BLOOMZ 모델의 경우 모델이 입력을 계속하려고 하지 않도록 입력이 중지되는 시점을 매우 명확하게 표시하는 것이 좋습니다.
예를 들어 끝에 마침표(.)가 없는 'Translate to English: Je t'aime' 프롬프트는 모델이 프랑스어 문장을 계속하려고 시도하는 결과를 초래할 수 있습니다.
더 나은 프롬프트는 예입니다.
"Translate to English: Je t'aime.", "Translate to English: Je t'aime. Translation:" "What is "Je t'aime." in English?"
또한 가능한 한 많은 컨텍스트를 모델에 제공하는 것이 좋습니다.
예를 들어, Telugu로 대답하게 하려면 모델에게 다음과 같이 말하십시오.
"Explain in a sentence in Telugu what is backpropagation in neural networks.".
'Generative AI > Language Model' 카테고리의 다른 글
[Large Language Model] mT5-xxl (0) | 2023.07.11 |
---|---|
[Large Language Model] Flan-UL2 (0) | 2023.07.11 |
[Large Language Model] BLOOM (0) | 2023.07.11 |
[Large Language Model] FLAN-T5 (0) | 2023.07.11 |
[Suvey Paper] Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond (0) | 2023.06.19 |