stands for “Generative Pre-trained Transformer Quantization”. 表示这个模型是支持GPU
GGML is a Tensor library for machine learning, it is just a C++ library that allows you to run LLMs on just the CPU or CPU + GPU. It defines a binary format for distributing large language models (LLMs). GGML makes use of a technique called quantization that allows for large language models to run on consumer hardware.
GPT-Generated Model Language，表示该模型支持CPU，目前已经淘汰。
GPT-Generated Unified Format，这是最新的版本支持CPU的大模型，替代GGML。
GGML supports a number of different quantization strategies (e.g. 4-bit, 5-bit, and 8-bit quantization), each of which offers different trade-offs between efficiency and performance.
Retrieval-augmented generation (RAG)
检索增强生成，意思是生成式问答（Generative Question Answering.）
pretrained models and fine-tuned models