Analyzing and Investigating Netflix Movies and Guest Stars in The Office

Abhishek Biswas
5 min readAug 18, 2022

About Netflix.

Netflix is a streaming service that provides a huge selection of TV series, films, anime, documentaries, and many other types of content on Numerous gadgets with internet access can watch these movies. It is essentially subscription-based.

Actually, Netflix began as a DVD retail business in 1997, and as of January 2021, it has more than 200 million customers and was the largest entertainment/media firm in terms of market valuation.

Motivation.

In The motivation of the article is to present how a beginners can handle a data frame and how to analyse the data frame tables and how to plot the features for understanding visually better about the features and many more.

Dataset.

The Dataset can be downloaded from the given link https://mega.nz/file/hKQyRJJR#e-Y_0-F6Ck7UA4Bd8vUpabC_KPoWXwve1EccxClVPR0 The Dataset has 7787 rows and 11 columns. The dateset basically contains all information regarding movies, TV shows , documentaries and many more from the year 2011 to 2021.

Steps.

  1. Importing the required and necessary libraries.
  2. Finding the Information Regarding the Netflix Dataset.
  3. Filtering of Movies from the Data Frame.
  4. Creation a Scatter Plot and Colour list Defining
  5. Plotting with colour!.

1. Importing the required and necessary libraries.

Before starting anything, make-sure numpy, pandas, matplotlib are installed on your computer. If they are not installed, it can be installed using pip installer in the terminal or command prompt. And the package is installed using the syntax.

pip install <package_name>

After installing the packages then we will import all the module. After importing, we will load the dataset using the pandas read_csv() function which enable us to load the csv (comma separated values) file in the python so that we can do the analysis.

Importing the necessary library in python

The head() function enable us to see the first five rows of the datasets.

Loading the dataset in pandas and seeing the first five rows of the data frame

2. Finding the Information Regarding the Netflix Dataset.

After loading the dataset in the python using pandas library. First we have to find the shape of the dataset using the shape function which returns a tuple containing the number of rows and column. Then we find the types of data present in the data frame was given below and it was found using the dtypes function present in the python. After finding the types of data we have to check is the dataset contains any null value or not. And this is found using .isnull().sum() function. This function helps to check what are the number of null values which are present in each columns of the data frame.

Checking the shape of dataset and checking the number of null values in the dataset
Checking column wise data types of the dataset

3. Filtering of Movies from the Data Frame.

After Finding the basic information we have to filter movies from the dataset and it can be done using the array/ list sub setting (basically creates a new data frame when using 2 square brackets during sub setting) in python. After sub setting the data frame and filtering the movies from the data frame then we have use the same sub setting techniques for extracting the column of interest which is title, country, genre, release_year, duration are also extracted from the updated movies dataset we just created using array/list sub setting techniques. Again The head() function for seeing the first 5 rows of the newly form dataset only contain movies.

Filtering the movies from the dataset and exacting the required rows and seeing the first five rows of the newly from dataset

4. Creation a Scatter Plot and Colour list Defining.

The Scatter plot is created using the newly from dataset. The function use is plt.scatter() for making the scatter plot. Here we are creating scatter plot between the duration of movies with the year of release. here we are checking that the any change in duration of movies happens in those years. At First the scatter plot it is difficult to understand first as all points got merge and have same colour, so we have do some formatting, so that the scatter plot looks better and easy to understand. so, the formatting process is started by defining the colour of each point based on genre of movies.

Scatter plot creation code between release year and duration
Scatter plot creation between release year and duration

The Colouring is done using the by creation of empty list after that we have iter the rows of the genre column of the data frame and its is done using for loops of python and iterrows() function of the pandas library and different colour is appended for different genre present in the movies data frame.

Defining individual colour of each genre of the movies using loops and iterrows() function

5. Plotting with colour!.

Now again the scatter plot is made but with colour, now it look so much better and easy to understand for making the scatter plot more understandable. The Title, xlabel and ylabel is definded. The scatter plot is make using plt.scatter() function present in the library of matplotlib.pyplot of python and showing of plot is done is using plt.show() function and the colour is given using the attribute c = <colours_list> in the plt.scatter function of matplotlib

Scatter plot creation code between release year and duration with colour
Scatter plot creation between release year and duration with colour

Resources / References.

  1. https://numpy.org/doc/
  2. https://pandas.pydata.org/docs/
  3. https://matplotlib.org/stable/contents.html

Thank you.

And I also gives thanks to the users for reading my article.

--

--