Get Labels

Split the data intro training and test

Use Random Search Cross Validation to develop and tune the model

Let’s initialize a TfidfVectorizer with stop words from the English language and a maximum document frequency of 0.7 (terms with a higher document frequency will be discarded).

Stop words are the most common words in a language that are to be filtered out before processing the natural language data, and a TfidfVectorizer turns a collection of raw documents into a matrix of TF-IDF features.

Predict on the Test Set