Denken
Menu
  • About Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials
  • Contact Me
Menu

Generative AI: LLMs: LoRA fine tuning 1.4

Posted on July 19, 2023July 19, 2023 by Aritra Sen

In the last post we discussed two approaches to fine tuning using feature-based method, these options may not be always efficient in terms of computational complexity as well as time complexity. Full fine tuning of any LLM models needs to stitch the below mentioned steps together:

  1. Load dataset in memory
  2. Load pretrained model in the memory
  3. Foward pass through the network
  4. Loss calculation and gradient calculations
  5. Optimize the weights

Combination of all these steps can produce lot of challenges in terms of –

  1. Memory requirements
  2. Computational requirements like more GPU
  3. Training time and associated cost
  4. Inference time

In one of the paper published by Microsoft, it has been shown that there exists a way called Parameter Efficient fine tuning which can help to tackle the above-mentioned problems. In this paper a technique called LoRA (Low Rank Adoption) has been introduced. In principle the concept resolves around the concept of Matrix Decomposition in lower ranks. A full fine-tuning of LLM goes through mainly two separate steps to generate the embeddings – 1. Foward pass through the network 2. Weights updates and in the end get the final embeddings as shown below –

Full finetuning of LLMs (Image: Author)

In case of LoRA it has been shown that pretrained model has low intrinsic dimensions, in other words there exists a low dimension reparameterization that is as effective doing the full parameter fine tuning. Pretrained weights can decomposed in low rank (Rank is linearly independently rows or columns of a matrix) matrices as shown below-

Low Rank representation (Image: Author)

For examples imagine that W is the pre trained weights with the dimension of 512 X 64. So we can say that if we want to full finetune the weights that total number of parameters would be 512 X 64 = 32768 which is lot of parameters to train. However, if we use two low rank matrices where the rank is 4 then these two matrices A and B can be represented (low dimension reparameterization) as follows

– A – 4 X 64 and B – 512 X 4.

So the total numbers of parameters would be (4 x 64 + 512 x 4) = 2304 which lot less when we compared to approximately 32k parameters. During training time, we freeze the pre-trained model parameters frozen and only train these two low rank matrices. During inference we combine these two matrices and add back to the pre-trained model weights as shown below –

LoRA (Image: Huggingface)

We can also train these low rank metrices for specific tasks and during inference time we can add back the task specific LoRA weights to the pretrained weights as shown below –

Task Specific LoRA (Image: Author)

In the previously mentioned paper it has been shown that similar model performence like full fine tuning can be achieved with LoRA as shown below –

LoRA performance comparison (Image: LoRA Paper)

In the LoRA they used the low rank adoption for different attention weight matrices like Q ,V. The study in the paper has been limited to only adapting the attention weights for downstream tasks and freeze the MLP modules (so they are not trained in downstream tasks) both for simplicity and parameter-efficiency. Surprisingly it has been observed that with low rank as low as r=1 very good performance can be achieved (r is hyperparameter to tune).

How to chose the rank ? (Image: LoRA paper)

In the next blog post we will implement the LoRA in code. Do like, share the post in case you find this post useful. Thanks for reading.

Category: Aritra Sen, Machine Learning, Python

Post navigation

← Generative AI: LLMs: Feature base finetuning 1.3
Generative AI: LLMs: Finetuning Llama2 with QLoRA on custom dataset 1.5 →

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RSS Feeds

Enter your email address:

Delivered by FeedBurner

Pages

  • About Me
  • Contact Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials

Tag Cloud

Announcements Anrdoid BERT Bias Celebration Cricket CyanogenMod deep-learning Denken Experience Facebook Features Finetuning GCN GenerativeAI GNN Google HBOOT HBOOT downgrading HTC Wildfire huggingface India Launch Life LLM Lumia 520 MachineLearning mobile My Space nlp Orkut People Python pytorch pytorch-geometric Rooting Sachin Share Social Network tranformers transformers Tutorials Twitter weight-initialization Windows Phone

WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.

Categories

Random Posts

  • Facebook and our Life
  • Python Tutorials
  • Graph Neural Network – Node Classification with PyG – 2.1
  • How to root HTC Wildfire 2.2.1 and Install Cyanogenmod
  • How to Share in Orkut

Recent Comments

  • Generative AI: LLMs: Reduce Hallucinations with Retrieval-Augmented-Generation (RAG) 1.8 – Denken on Generative AI: LLMs: Semantic Search and Conversation Retrieval QA using Vector Store and LangChain 1.7
  • vikas on Domain Fuss
  • Kajal on Deep Learning with Pytorch -Text Generation – LSTMs – 3.3
  • Aritra Sen on Python Tutorials – 1.1 – Variables and Data Types
  • Aakash on Python Tutorials – 1.1 – Variables and Data Types

Visitors Count

AmazingCounters.com

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright

AritraSen’s site© This site has been protected from copyright by copyscape.Copying from this site is stricktly prohibited. Protected by Copyscape Original Content Validator
© 2025 Denken | Powered by Minimalist Blog WordPress Theme