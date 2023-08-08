Artificial intelligence (AI) is experiencing a surge in popularity, with AI algorithms being used in various products and services. However, along with its popularity comes increased scrutiny on the computational and environmental costs associated with AI, particularly in deep learning.

The costs of deep learning are influenced by several factors, including the size and structure of the models, the processors used, and the numerical representation of the data. Over the years, state-of-the-art models have been growing in size, which has led to increasing compute requirements. While processor compute power has also increased, it has not kept up with the growing costs of AI models.

To address this issue, researchers have been exploring numerical representation in an attempt to reduce the cost of AI. One method that is widely used is quantization, which involves reducing the number of bits required to represent the weights of a network. This not only improves computational efficiency but also reduces power consumption.

AI models are typically trained using 32-bit floating point (FP32) data types. However, it has been found that all 32 bits are not always necessary for maintaining accuracy. There have been successful attempts at training models using 16-bit floating point (FP16) data types, and even lower precision data types like 8-bit floating point (FP8) and integer (INT8). The goal is to find the minimum number of bits that maintains accuracy.

There are two primary approaches to quantizing a neural network: Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). PTQ converts the model’s weights and activations to lower-precision formats after training, while QAT incorporates quantization during training itself.

Currently, there is a debate in the AI industry regarding the preferred quantized data types. INT8 and FP8 are two popular choices, with hardware vendors taking different sides. However, the best data type depends on the specific AI processors and model architectures.

Floating point and integer data types differ in their representation and storage of numerical values. Floating point data types are used for real numbers, while integer data types represent whole numbers without fractions. Floating point numbers have a wider dynamic range, while integer numbers have a smaller range and fixed precision.

In deep learning, the numerical representation requirements vary between the training and inference phases. During training, higher dynamic range is needed to accurately propagate gradients and ensure the convergence of the learning process. Therefore, floating-point representations like FP32, FP16, or even FP8 are commonly used.

Overall, the trend in AI is towards lower precision computing, which allows for faster calculations and reduces costs. However, the choice of data type depends on the specific requirements of the AI model and the hardware being used.