Denken
Menu
  • About Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials
  • Contact Me
Menu

1.1 – Fine Tune a Transformer Model (1/2)

Posted on November 25, 2021November 25, 2021 by Aritra Sen

In the last post , we have talked about Transformer pipeline , the inner workings of all important tokenizer module and in the last we made predictions using the exiting pre-trained models.

During fine-tuning, we can adjust the weights of the model in the following two ways:

  1. Update the weights of the pre-trained BERT model along with the classification layer.
  2. Update only the weights of the classification layer and not the pre-trained BERT model. This process becomes as using the pre-trained BERT model as a feature extractor.

In this tutorial we will do the first process where where we will update the weights of the pretrained BERT model along with the classification layer. The dataset we will use is a kaggle TweetSentiment_Analysis dataset. Dataset details are given below –

It contains 1,600,000 tweets extracted using the twitter api . The tweets have been annotated (0 = negative, 4 = positive) and they can be used to detect sentiment .

It contains the following 6 fields:

  1. target: the polarity of the tweet (0 = negative, 4 = positive)
  2. ids: The id of the tweet ( 2087)
  3. date: the date of the tweet (Sat May 16 23:58:44 UTC 2009)
  4. flag: The query (lyx). If there is no query, then this value is NO_QUERY.
  5. user: the user that tweeted (robotickilldozr)
  6. text: the text of the tweet (Lyx is cool)

Next we will start writing the code-


# Importing the required Library
import transformers
import torch
import numpy as np
from torch.nn import functional as F
import pandas as pd
import tqdm
# Reading the dataset with no columns titles and with latin encoding 
df_raw = pd.read_csv('../input/sentiment140/training.1600000.processed.noemoticon.csv', encoding = "ISO-8859-1", header=None)

 # As the data has no column titles, we will add our own
df_raw.columns = ["label", "time", "date", "query", "username", "text"]

# Show the first 5 rows of the dataframe.
# You can specify the number of rows to be shown as follows: df_raw.head(10)
df_raw.head()
# Checking label column distribution , The label '4' denotes positive sentiment and '0' denotes negative sentiment
df_raw['label'].value_counts()
0    800000
4    800000
Name: label, dtype: int64
# keeping only the text and the label, as we won't need any of the other columns
df = df_raw[['label', 'text']]

label_dict = {4 : 1 , 0 : 0}

# mapping label 4 as class 1
df.loc[:,'label'] = df['label'].map(label_dict)

df.head()
# doing Train/Test split
from sklearn.model_selection import train_test_split
train_texts, val_texts, train_labels, val_labels = train_test_split(df['text'].values, df['label'].values, test_size=.2)

# Importing the pre trained tokenizer
from transformers import DistilBertTokenizerFast , AutoTokenizer
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')

Next we will create the train/validation input ids and attention masks and convert them to torch tensors. If you are wondering about , why we are looping through the data and then doing the tokenizing each text , the reason is if you try to do it at one go you may face out of memory error.

# Creating train input ids and attention masks
train_input_ids = []
train_attention_mask = []
for text in tqdm.tqdm(train_texts):
    encoding = tokenizer.encode_plus(
    text,
    add_special_tokens=True,
    max_length=64,
    padding = 'max_length',
    truncation = True,
    return_attention_mask= True,
    return_tensors='pt')
    train_input_ids.append(encoding['input_ids'])
    train_attention_mask.append(encoding['attention_mask'])
train_input_ids = torch.cat(train_input_ids,dim=0)
train_attention_mask = torch.cat(train_attention_mask,dim=0)
100%|██████████| 1280000/1280000 [04:32<00:00, 4693.16it/s]
val_input_ids = []
val_attention_mask = []
for text in tqdm.tqdm(val_texts):
    encoding = tokenizer.encode_plus(
    text,
    add_special_tokens=True,
    max_length=64,
    padding = 'max_length',
    return_attention_mask= True,
    truncation = True,
    return_tensors='pt')
    val_input_ids.append(encoding['input_ids'])
    val_attention_mask.append(encoding['attention_mask'])
val_input_ids = torch.cat(val_input_ids,dim=0)
val_attention_mask = torch.cat(val_attention_mask,dim=0)
100%|██████████| 320000/320000 [01:04<00:00, 4981.57it/s]

Next we will create train and test Torch TensorDataset by combining the inputs ids , attention masks and lables. You need to keep in mind that labels has to be of type torch.long datatype which transformers library requires. After that we will create torch DataLoaders for batch processing.

train_dataset = torch.utils.data.TensorDataset(train_input_ids,
                                               train_attention_mask,
                                               torch.tensor(train_labels,dtype=torch.long))
val_dataset = torch.utils.data.TensorDataset(val_input_ids,
                                             val_attention_mask,
                                             torch.tensor(val_labels,dtype=torch.long))


train_loader = torch.utils.data.DataLoader(train_dataset,shuffle=True,batch_size=32)
val_loader = torch.utils.data.DataLoader(val_dataset,shuffle=False,batch_size=32)

Next we will import the pre-trained distilbert model and also the AdamW optimizer from transformers library.

from transformers import DistilBertForSequenceClassification , AdamW

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
model.to(device)

model.train()

optimizer = AdamW(model.parameters(), lr=5e-5)
# function to print the model performence
from sklearn.metrics import f1_score,accuracy_score
def calculate_model_performence(labels,prediction):
    print('F1 Score:',f1_score(labels,prediction))
    print('Accuracy :',accuracy_score(labels,prediction))

Training Loop with one epoch

batch_labels = []
batch_prediction = []
for batch in tqdm.tqdm(train_loader):
    optimizer.zero_grad()
    input_ids = batch[0].to(device)
    attention_mask = batch[1].to(device)
    labels = batch[2].to(device)
    outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
    loss = outputs[0]
    preds = torch.argmax(outputs['logits'],dim=1)
    loss.backward()
    optimizer.step()
    batch_labels.extend(labels.cpu().numpy())
    batch_prediction.extend(preds.cpu().numpy())
100%|██████████| 40000/40000 [1:09:36<00:00,  9.58it/s]
# Check the model performence on the training dataset
print(calculate_model_performence(batch_labels,batch_prediction))
F1 Score: 0.8456842439708853
Accuracy : 0.84594453125
# Validation loop performence

batch_labels = []
batch_prediction = []
model.eval()
for batch in tqdm.tqdm(val_loader):
    input_ids = batch[0].to(device)
    attention_mask = batch[1].to(device)
    labels = batch[2].to(device)
    with torch.no_grad():
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
    preds = torch.argmax(outputs['logits'],dim=1)
    batch_labels.extend(labels.cpu().numpy())
    batch_prediction.extend(preds.cpu().numpy())
100%|██████████| 10000/10000 [05:13<00:00, 31.90it/s]
print(calculate_model_performence(batch_labels,batch_prediction))
F1 Score: 0.8497619467466055
Accuracy : 0.856225

Running the training loop for more iteration can improve the performance on the validation set. In the next post we will talk how to transformer in built process to fine tune the library and how to update only the weights of the classification layer and not the pre-trained BERT model. This process becomes as using the pre-trained BERT model as a feature extractor.

Thanks for reading and please comment if you have any questions.

Category: Aritra Sen, Denken, Machine Learning, Python

Post navigation

← 1.0 – Getting started with Transformers for NLP
1.2 – Fine Tune a Transformer Model (2/2) →

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RSS Feeds

Enter your email address:

Delivered by FeedBurner

Pages

  • About Me
  • Contact Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials

Tag Cloud

Announcements Anrdoid BERT Bias Celebration Cricket CyanogenMod deep-learning Denken Experience Facebook Features Finetuning GCN GenerativeAI GNN Google HBOOT HBOOT downgrading HTC Wildfire huggingface India Launch Life LLM Lumia 520 MachineLearning mobile My Space nlp Orkut People Python pytorch pytorch-geometric Rooting Sachin Share Social Network tranformers transformers Tutorials Twitter weight-initialization Windows Phone

WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.

Categories

Random Posts

  • One of those days
  • Contact Me
  • Generative AI: LLMs: Finetuning Approaches 1.1
  • Generative AI: LLMs: Reduce Hallucinations with Retrieval-Augmented-Generation (RAG) 1.8
  • A place for Celebration

Recent Comments

  • Generative AI: LLMs: Reduce Hallucinations with Retrieval-Augmented-Generation (RAG) 1.8 – Denken on Generative AI: LLMs: Semantic Search and Conversation Retrieval QA using Vector Store and LangChain 1.7
  • vikas on Domain Fuss
  • Kajal on Deep Learning with Pytorch -Text Generation – LSTMs – 3.3
  • Aritra Sen on Python Tutorials – 1.1 – Variables and Data Types
  • Aakash on Python Tutorials – 1.1 – Variables and Data Types

Visitors Count

AmazingCounters.com

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright

AritraSen’s site© This site has been protected from copyright by copyscape.Copying from this site is stricktly prohibited. Protected by Copyscape Original Content Validator
© 2025 Denken | Powered by Minimalist Blog WordPress Theme