Denken
Menu
  • About Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials
  • Contact Me
Menu

NLP: Toxic comment classification with TorchText & Pytorch Lightning

Posted on January 8, 2023July 11, 2023 by Aritra Sen

In this blog post we would try to do a multilabel (6 target classes) NLP Toxic comment classification for a previously Kaggle hosted competition. Please see below the competition overview from the Kaggle page:

Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments.

The Conversation AI team, a research initiative founded by Jigsaw and Google (both a part of Alphabet) are working on tools to help improve online conversation. One area of focus is the study of negative online behaviors, like toxic comments (i.e. comments that are rude, disrespectful or otherwise likely to make someone leave a discussion). So far they’ve built a range of publicly available models served through the Perspective API, including toxicity. But the current models still make errors, and they don’t allow users to select which types of toxicity they’re interested in finding (e.g. some platforms may be fine with profanity, but not with other types of toxic content).

In this competition, you’re challenged to build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate better than Perspective’s current models. You’ll be using a dataset of comments from Wikipedia’s talk page edits. Improvements to the current model will hopefully help online discussion become more productive and respectful.

Disclaimer: the dataset for this competition contains text that may be considered profane, vulgar, or offensive

Toxic Comment Classification Challenge | Kaggle

In the below code walk through we will try to use an embedding layer along with LSTM layer and fully connected layer to form a neural network architecture which will be trained using pytorch lightning which makes the training steps much easier and cleaner. We will also use the torch text module which comes from pytorch itself which makes creation of vocabulary easier. Please follow the comments provided in the notebook itself to understand more about the code.

Please note this model architecture is not the optimal one, this is only for tutorial purpose. Try with different architectures, more epochs, different batch sizes to come up with better results.

Do like, share and comment if you have any questions.

Category: Machine Learning, Python

Post navigation

← Graph Neural Network – Node Classification with PyG – 2.1
Generative AI: LLMs: Getting Started 1.0 →

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RSS Feeds

Enter your email address:

Delivered by FeedBurner

Pages

  • About Me
  • Contact Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials

Tag Cloud

Announcements Anrdoid BERT Bias Celebration Cricket CyanogenMod deep-learning Denken Experience Facebook Features Finetuning GCN GenerativeAI GNN Google HBOOT HBOOT downgrading HTC Wildfire huggingface India Launch Life LLM Lumia 520 MachineLearning mobile My Space nlp Orkut People Python pytorch pytorch-geometric Rooting Sachin Share Social Network tranformers transformers Tutorials Twitter weight-initialization Windows Phone

WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.

Categories

Random Posts

  • Deep Learning with Pytorch -Text Generation – LSTMs – 3.3
  • Python Tutorials – 1.0 – Getting Started
  • 1.2 – Fine Tune a Transformer Model (2/2)
  • 1.0 – Getting started with Transformers for NLP
  • Generative AI: LLMs: LangChain + Llama-2-chat on Amazon mobile review dataset 1.6

Recent Comments

  • Generative AI: LLMs: Reduce Hallucinations with Retrieval-Augmented-Generation (RAG) 1.8 – Denken on Generative AI: LLMs: Semantic Search and Conversation Retrieval QA using Vector Store and LangChain 1.7
  • vikas on Domain Fuss
  • Kajal on Deep Learning with Pytorch -Text Generation – LSTMs – 3.3
  • Aritra Sen on Python Tutorials – 1.1 – Variables and Data Types
  • Aakash on Python Tutorials – 1.1 – Variables and Data Types

Visitors Count

AmazingCounters.com

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright

AritraSen’s site© This site has been protected from copyright by copyscape.Copying from this site is stricktly prohibited. Protected by Copyscape Original Content Validator
© 2025 Denken | Powered by Minimalist Blog WordPress Theme