Apple Intelligence Foundation Language Models
Today's paper introduces Apple's foundation language models that power Apple Intelligence features across iOS, iPadOS, and macOS. It introduces two models - a 3 billion parameter on-device model and a larger server-based model - designed to perform a wide range of tasks efficiently, accurately and responsibly.
Method Overview
The paper introduces two main foundation models - AFM-on-device (~3 billion parameters) and AFM-server (larger model). These are dense decoder-only Transformer models with several architectural optimizations for efficiency.
The training process involves three main stages:
Core pre-training on a diverse, high-quality dataset of 6.3 trillion tokens. This includes web pages, licensed datasets, code, math content, and public datasets - all carefully filtered and processed.
Continued pre-training for 1 trillion tokens at longer sequence lengths, with adjusted data mixture.
Context lengthening for 100 billion tokens at 32k sequence length.
For post-training, they use supervised fine-tuning and reinforcement learning from human feedback (RLHF) to align the models with Apple's values and improve capabilities. This includes novel techniques like iterative teaching committee (iTeC) and mirror descent with leave-one-out estimation (MDLOO).
The models are then specialized for specific tasks using LoRA adapters that can be dynamically loaded. Extensive optimizations like quantization and pruning are applied to make the models efficient for on-device use.
Results
The paper reports strong performance of their models across various benchmarks:
AFM-on-device outperforms larger open-source models like Mistral-7B on instruction following and math tasks.
AFM-server achieves competitive performance against GPT-3.5 on general capabilities.
Both models show superior performance on safety evaluations compared to open-source and commercial models.
Conclusion
The paper introduces Apple's foundation language models that power Apple Intelligence features across devices. These models are designed to be fast, efficient, and highly capable. For more information please consult the full paper.
Congrats to the authors for their work!
Apple. "Apple Intelligence Foundation Language Models." arXiv, 29 Jul. 2024, arxiv.org/abs/2407.21075.







