Data Pre-Processing is the most important step to start with before the analysis and model building starts. There are several factors to be considered in a large dataset. For example, outlier handling, missing value treatment, handling categorical variables, determining and dropping unnecessary columns and variable transformation. More than half of the time is spent on these data cleaning techniques by a data scientist.
Let’s look at the contents of the below dataset, clean the data and create a model.
Streamist is a streaming company that streams web series and movies for a worldwide audience. Every content on their portal is rated by the viewers, and the portal also provides other information for the content like the number of people who have watched it, the number of people who want to watch it, the number of episodes, duration of an episode, etc.
They are currently focusing on the anime available in their portal, and want to identify the most important factors involved in rating an anime. You as a data scientist at Streamist are tasked with identifying the important factors and building a predictive model to predict the rating on an anime.
To preprocess the raw data, analyze it, and build a linear regression model to predict the ratings of anime.
Is there a good predictive model for the rating of an anime? What does the performance assessment look like for such a model?
Each record in the database provides a description of an anime. A detailed data dictionary can be found below.
Data Dictionary
In [1]:
# this will help in making the Python code more structured automatically (good coding practice) %load_ext nb_black # Libraries to help with reading and manipulating data import numpy as np import pandas as pd # Libraries to help with data visualization import matplotlib.pyplot as plt import seaborn as sns sns.set() # Removes the limit for the number of displayed columns pd.set_option("display.max_columns", None) # Sets the limit for the number of displayed rows pd.set_option("display.max_rows", 200) # to split the data into train and test from sklearn.model_selection import train_test_split # to build linear regression_model from sklearn.linear_model import LinearRegression # to check model performance from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
In [2]:
# loading the dataset data = pd.read_csv("anime_data_raw.csv")
In [3]:
# checking the shape of the data print(f"There are {data.shape[0]} rows and {data.shape[1]} columns.") # f-string
There are 14578 rows and 48 columns.
In [4]:
# let's view a sample of the data data.sample( 10, random_state=2 ) # setting the random_state will ensure we get the same results every time
Out[4]:
title | mediaType | eps | duration | ongoing | startYr | finishYr | sznOfRelease | description | studios | contentWarn | watched | watching | wantWatch | dropped | rating | votes | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13764 | Spy Penguin (2013): White Christmas | Web | 1.0 | 2min | False | 2013.0 | 2013.0 | NaN | NaN | [‘Next Media Animation’] | 0 | 8.0 | 0 | 10 | 0 | NaN | NaN | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3782 | A Little Snow Fairy Sugar Summer Specials | TV Special | 2.0 | NaN | False | 2003.0 | 2003.0 | NaN | One day, when Saga finds an old princess costu… | [‘J.C. Staff’] | 0 | 1056.0 | 24 | 576 | 16 | 3.449 | 571.0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2289 | Umineko: When They Cry | TV | 26.0 | NaN | False | 2009.0 | 2009.0 | Summer | In the year 1986, eighteen members of the Ushi… | [‘Studio Deen’] | 1 | 10896.0 | 1451 | 8480 | 1236 | 3.787 | 9463.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5081 | Unbreakable Machine-Doll Specials | DVD Special | 6.0 | 5min | False | 2013.0 | 2014.0 | NaN | NaN | [‘Lerche’] | 1 | 1957.0 | 201 | 756 | 50 | 3.169 | 1312.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9639 | Hanako Oku: Hanabi | TV | 1.0 | 6min | False | 2015.0 | 2015.0 | NaN | NaN | [] | 0 | 46.0 | 1 | 54 | 1 | 2.166 | 33.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
12608 | Tamagotchi Honto no Hanashi | Movie | 1.0 | 20min | False | 1997.0 | 1997.0 | NaN | NaN | [] | 0 | 11.0 | 2 | 18 | 0 | NaN | NaN | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6735 | Violinist of Hamelin Movie | Movie | 1.0 | 30min | False | 1996.0 | 1996.0 | NaN | While on their quest to stop the Demon King, t… | [‘Nippon Animation’] | 0 | 247.0 | 6 | 167 | 8 | 2.826 | 152.0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
12846 | Neko Kikaku | Movie | 1.0 | 37min | False | 2018.0 | 2018.0 | NaN | Nyagoya City is a trendy town where cats live…. | [‘Speed Inc.’] | 0 | 12.0 | 3 | 102 | 2 | NaN | NaN | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
884 | Saint Young Men Movie | Movie | 1.0 | 1hr 30min | False | 2013.0 | 2013.0 | NaN | Jesus and Buddha are enjoying their vacation i… | [‘A-1 Pictures’] | 0 | 2726.0 | 68 | 2074 | 37 | 4.156 | 1962.0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10524 | Delinquent Hamsters / papalion ft. Piso Studio | Web | 1.0 | 2min | False | 2017.0 | 2017.0 | Fall | NaN | [‘Piso Studio’] | 0 | 18.0 | 0 | 18 | 0 | 1.927 | 10.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
duration
column has values in hours and minutes.studios
column has a list of values.In [5]:
# creating a copy of the data so that original data remains unchanged df = data.copy()
In [6]:
# checking for duplicate values in the data df.duplicated().sum()
Out[6]:
0
In [7]:
# checking the names of the columns in the data print(df.columns)
Index(['title', 'mediaType', 'eps', 'duration', 'ongoing', 'startYr', 'finishYr', 'sznOfRelease', 'description', 'studios', 'contentWarn', 'watched', 'watching', 'wantWatch', 'dropped', 'rating', 'votes', 'tag_'Comedy'', 'tag_'Based on a Manga'', 'tag_'Action'', 'tag_'Fantasy'', 'tag_'Sci Fi'', 'tag_'Shounen'', 'tag_'Family Friendly'', 'tag_'Original Work'', 'tag_'Non-Human Protagonists'', 'tag_'Adventure'', 'tag_'Short Episodes'', 'tag_'Drama'', 'tag_'Shorts'', 'tag_'Romance'', 'tag_'School Life'', 'tag_'Slice of Life'', 'tag_'Animal Protagonists'', 'tag_'Seinen'', 'tag_'Supernatural'', 'tag_'Magic'', 'tag_'CG Animation'', 'tag_'Mecha'', 'tag_'Ecchi'', 'tag_'Based on a Light Novel'', 'tag_'Anthropomorphic'', 'tag_'Superpowers'', 'tag_'Promotional'', 'tag_'Sports'', 'tag_'Historical'', 'tag_'Vocaloid'', 'tag_Others'], dtype='object')
In [8]:
# checking column datatypes and number of non-null values df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14578 entries, 0 to 14577 Data columns (total 48 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 title 14578 non-null object 1 mediaType 14510 non-null object 2 eps 14219 non-null float64 3 duration 9137 non-null object 4 ongoing 14578 non-null bool 5 startYr 14356 non-null float64 6 finishYr 14134 non-null float64 7 sznOfRelease 3767 non-null object 8 description 8173 non-null object 9 studios 14578 non-null object 10 contentWarn 14578 non-null int64 11 watched 14356 non-null float64 12 watching 14578 non-null int64 13 wantWatch 14578 non-null int64 14 dropped 14578 non-null int64 15 rating 12107 non-null float64 16 votes 12119 non-null float64 17 tag_'Comedy' 14578 non-null int64 18 tag_'Based on a Manga' 14578 non-null int64 19 tag_'Action' 14578 non-null int64 20 tag_'Fantasy' 14578 non-null int64 21 tag_'Sci Fi' 14578 non-null int64 22 tag_'Shounen' 14578 non-null int64 23 tag_'Family Friendly' 14578 non-null int64 24 tag_'Original Work' 14578 non-null int64 25 tag_'Non-Human Protagonists' 14578 non-null int64 26 tag_'Adventure' 14578 non-null int64 27 tag_'Short Episodes' 14578 non-null int64 28 tag_'Drama' 14578 non-null int64 29 tag_'Shorts' 14578 non-null int64 30 tag_'Romance' 14578 non-null int64 31 tag_'School Life' 14578 non-null int64 32 tag_'Slice of Life' 14578 non-null int64 33 tag_'Animal Protagonists' 14578 non-null int64 34 tag_'Seinen' 14578 non-null int64 35 tag_'Supernatural' 14578 non-null int64 36 tag_'Magic' 14578 non-null int64 37 tag_'CG Animation' 14578 non-null int64 38 tag_'Mecha' 14578 non-null int64 39 tag_'Ecchi' 14578 non-null int64 40 tag_'Based on a Light Novel' 14578 non-null int64 41 tag_'Anthropomorphic' 14578 non-null int64 42 tag_'Superpowers' 14578 non-null int64 43 tag_'Promotional' 14578 non-null int64 44 tag_'Sports' 14578 non-null int64 45 tag_'Historical' 14578 non-null int64 46 tag_'Vocaloid' 14578 non-null int64 47 tag_Others 14578 non-null int64 dtypes: bool(1), float64(6), int64(35), object(6) memory usage: 5.2+ MB
ongoing
column is of bool type.In [9]:
# checking for missing values in the data. df.isnull().sum()
Out[9]:
title 0 mediaType 68 eps 359 duration 5441 ongoing 0 startYr 222 finishYr 444 sznOfRelease 10811 description 6405 studios 0 contentWarn 0 watched 222 watching 0 wantWatch 0 dropped 0 rating 2471 votes 2459 tag_'Comedy' 0 tag_'Based on a Manga' 0 tag_'Action' 0 tag_'Fantasy' 0 tag_'Sci Fi' 0 tag_'Shounen' 0 tag_'Family Friendly' 0 tag_'Original Work' 0 tag_'Non-Human Protagonists' 0 tag_'Adventure' 0 tag_'Short Episodes' 0 tag_'Drama' 0 tag_'Shorts' 0 tag_'Romance' 0 tag_'School Life' 0 tag_'Slice of Life' 0 tag_'Animal Protagonists' 0 tag_'Seinen' 0 tag_'Supernatural' 0 tag_'Magic' 0 tag_'CG Animation' 0 tag_'Mecha' 0 tag_'Ecchi' 0 tag_'Based on a Light Novel' 0 tag_'Anthropomorphic' 0 tag_'Superpowers' 0 tag_'Promotional' 0 tag_'Sports' 0 tag_'Historical' 0 tag_'Vocaloid' 0 tag_Others 0 dtype: int64
In [10]:
# Let's look at the statistical summary of the data df.describe(include="all").T
Out[10]:
count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|---|---|---|
title | 14578 | 14578 | Fullmetal Alchemist: Brotherhood | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
mediaType | 14510 | 8 | TV | 4510 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
eps | 14219.0 | NaN | NaN | NaN | 13.501231 | 62.262185 | 1.0 | 1.0 | 1.0 | 12.0 | 2527.0 |
duration | 9137 | 147 | 4min | 964 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
ongoing | 14578 | 2 | False | 14356 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
startYr | 14356.0 | NaN | NaN | NaN | 2005.457788 | 14.707105 | 1907.0 | 2000.0 | 2010.0 | 2016.0 | 2026.0 |
finishYr | 14134.0 | NaN | NaN | NaN | 2005.515919 | 14.656509 | 1907.0 | 2000.0 | 2010.0 | 2016.0 | 2026.0 |
sznOfRelease | 3767 | 4 | Spring | 1202 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
description | 8173 | 8108 | In 19th century Belgium, in the Flanders count… | 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
studios | 14578 | 864 | [] | 4808 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
contentWarn | 14578.0 | NaN | NaN | NaN | 0.098024 | 0.297358 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
watched | 14356.0 | NaN | NaN | NaN | 2408.043396 | 7168.368428 | 0.0 | 25.0 | 165.0 | 1469.5 | 161567.0 |
watching | 14578.0 | NaN | NaN | NaN | 213.026684 | 1261.70764 | 0.0 | 1.0 | 7.0 | 63.0 | 74537.0 |
wantWatch | 14578.0 | NaN | NaN | NaN | 1021.729112 | 2145.010604 | 0.0 | 24.0 | 175.0 | 980.0 | 28541.0 |
dropped | 14578.0 | NaN | NaN | NaN | 125.963026 | 453.577348 | 0.0 | 1.0 | 7.0 | 40.0 | 19481.0 |
rating | 12107.0 | NaN | NaN | NaN | 2.948697 | 0.827642 | 0.844 | 2.3035 | 2.965 | 3.6155 | 4.702 |
votes | 12119.0 | NaN | NaN | NaN | 2085.787771 | 5946.283685 | 10.0 | 34.0 | 218.0 | 1412.5 | 131067.0 |
tag_’Comedy’ | 14578.0 | NaN | NaN | NaN | 0.262999 | 0.440277 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
tag_’Based on a Manga’ | 14578.0 | NaN | NaN | NaN | 0.262176 | 0.439833 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
tag_’Action’ | 14578.0 | NaN | NaN | NaN | 0.211003 | 0.408034 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Fantasy’ | 14578.0 | NaN | NaN | NaN | 0.175881 | 0.380732 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Sci Fi’ | 14578.0 | NaN | NaN | NaN | 0.153931 | 0.360895 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Shounen’ | 14578.0 | NaN | NaN | NaN | 0.128001 | 0.334102 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Family Friendly’ | 14578.0 | NaN | NaN | NaN | 0.126835 | 0.332799 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Original Work’ | 14578.0 | NaN | NaN | NaN | 0.126149 | 0.332029 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Non-Human Protagonists’ | 14578.0 | NaN | NaN | NaN | 0.120593 | 0.325664 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Adventure’ | 14578.0 | NaN | NaN | NaN | 0.10557 | 0.307297 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Short Episodes’ | 14578.0 | NaN | NaN | NaN | 0.104267 | 0.305617 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Drama’ | 14578.0 | NaN | NaN | NaN | 0.102552 | 0.303383 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Shorts’ | 14578.0 | NaN | NaN | NaN | 0.092811 | 0.290177 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Romance’ | 14578.0 | NaN | NaN | NaN | 0.082453 | 0.275063 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’School Life’ | 14578.0 | NaN | NaN | NaN | 0.080327 | 0.271807 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Slice of Life’ | 14578.0 | NaN | NaN | NaN | 0.076691 | 0.26611 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Animal Protagonists’ | 14578.0 | NaN | NaN | NaN | 0.072781 | 0.259785 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Seinen’ | 14578.0 | NaN | NaN | NaN | 0.067979 | 0.251719 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Supernatural’ | 14578.0 | NaN | NaN | NaN | 0.064824 | 0.246223 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Magic’ | 14578.0 | NaN | NaN | NaN | 0.057004 | 0.231858 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’CG Animation’ | 14578.0 | NaN | NaN | NaN | 0.055632 | 0.229217 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Mecha’ | 14578.0 | NaN | NaN | NaN | 0.049664 | 0.217257 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Ecchi’ | 14578.0 | NaN | NaN | NaN | 0.048909 | 0.215686 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Based on a Light Novel’ | 14578.0 | NaN | NaN | NaN | 0.048018 | 0.213811 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Anthropomorphic’ | 14578.0 | NaN | NaN | NaN | 0.04397 | 0.205036 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Superpowers’ | 14578.0 | NaN | NaN | NaN | 0.039374 | 0.194491 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Promotional’ | 14578.0 | NaN | NaN | NaN | 0.03814 | 0.19154 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Sports’ | 14578.0 | NaN | NaN | NaN | 0.036974 | 0.188703 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Historical’ | 14578.0 | NaN | NaN | NaN | 0.03615 | 0.186671 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Vocaloid’ | 14578.0 | NaN | NaN | NaN | 0.034916 | 0.183572 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_Others | 14578.0 | NaN | NaN | NaN | 0.080601 | 0.272231 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
rating
column as it is the target variable.In [11]:
df.dropna(subset=["rating"], inplace=True)
In [12]:
# let us reset the dataframe index df.reset_index(inplace=True, drop=True)
In [13]:
# checking missing values in rest of the data df.isnull().sum()
Out[13]:
title 0 mediaType 63 eps 0 duration 4636 ongoing 0 startYr 6 finishYr 121 sznOfRelease 8560 description 4474 studios 0 contentWarn 0 watched 115 watching 0 wantWatch 0 dropped 0 rating 0 votes 0 tag_'Comedy' 0 tag_'Based on a Manga' 0 tag_'Action' 0 tag_'Fantasy' 0 tag_'Sci Fi' 0 tag_'Shounen' 0 tag_'Family Friendly' 0 tag_'Original Work' 0 tag_'Non-Human Protagonists' 0 tag_'Adventure' 0 tag_'Short Episodes' 0 tag_'Drama' 0 tag_'Shorts' 0 tag_'Romance' 0 tag_'School Life' 0 tag_'Slice of Life' 0 tag_'Animal Protagonists' 0 tag_'Seinen' 0 tag_'Supernatural' 0 tag_'Magic' 0 tag_'CG Animation' 0 tag_'Mecha' 0 tag_'Ecchi' 0 tag_'Based on a Light Novel' 0 tag_'Anthropomorphic' 0 tag_'Superpowers' 0 tag_'Promotional' 0 tag_'Sports' 0 tag_'Historical' 0 tag_'Vocaloid' 0 tag_Others 0 dtype: int64
In [14]:
df[df.startYr.isnull()]
Out[14]:
title | mediaType | eps | duration | ongoing | startYr | finishYr | sznOfRelease | description | studios | contentWarn | watched | watching | wantWatch | dropped | rating | votes | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1405 | Unbelievable Space Love | Web | 10.0 | 1min | False | NaN | NaN | NaN | NaN | [] | 0 | 90.0 | 16 | 343 | 0 | 4.012 | 54.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5222 | Manbo-P: Irokoizata wa Subete Sakuzu de Kaiket… | Music Video | 1.0 | 5min | False | NaN | NaN | NaN | NaN | [] | 0 | 41.0 | 0 | 25 | 0 | 3.139 | 20.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
9813 | Mameshiba: Mamerry Christmas | Other | 1.0 | 1min | False | NaN | NaN | NaN | NaN | [] | 0 | 57.0 | 1 | 17 | 0 | 2.119 | 35.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10258 | Meow no Hoshi | Other | 1.0 | 5min | False | NaN | NaN | NaN | NaN | [] | 0 | 40.0 | 0 | 25 | 0 | 1.999 | 25.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11970 | Landmark | Web | 1.0 | 4min | False | NaN | NaN | NaN | NaN | [] | 0 | 34.0 | 0 | 9 | 0 | 1.256 | 21.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
12077 | Burutabu-chan | Other | 3.0 | 1min | False | NaN | NaN | NaN | NaN | [] | 0 | 46.0 | 1 | 10 | 1 | 1.046 | 33.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
In [15]:
df.dropna(subset=["startYr"], inplace=True) # let us reset the dataframe index df.reset_index(inplace=True, drop=True)
In [16]:
# checking missing values in rest of the data df.isnull().sum()
Out[16]:
title 0 mediaType 63 eps 0 duration 4636 ongoing 0 startYr 0 finishYr 115 sznOfRelease 8554 description 4468 studios 0 contentWarn 0 watched 115 watching 0 wantWatch 0 dropped 0 rating 0 votes 0 tag_'Comedy' 0 tag_'Based on a Manga' 0 tag_'Action' 0 tag_'Fantasy' 0 tag_'Sci Fi' 0 tag_'Shounen' 0 tag_'Family Friendly' 0 tag_'Original Work' 0 tag_'Non-Human Protagonists' 0 tag_'Adventure' 0 tag_'Short Episodes' 0 tag_'Drama' 0 tag_'Shorts' 0 tag_'Romance' 0 tag_'School Life' 0 tag_'Slice of Life' 0 tag_'Animal Protagonists' 0 tag_'Seinen' 0 tag_'Supernatural' 0 tag_'Magic' 0 tag_'CG Animation' 0 tag_'Mecha' 0 tag_'Ecchi' 0 tag_'Based on a Light Novel' 0 tag_'Anthropomorphic' 0 tag_'Superpowers' 0 tag_'Promotional' 0 tag_'Sports' 0 tag_'Historical' 0 tag_'Vocaloid' 0 tag_Others 0 dtype: int64
In [17]:
df[df.finishYr.isnull()]
Out[17]:
title | mediaType | eps | duration | ongoing | startYr | finishYr | sznOfRelease | description | studios | contentWarn | watched | watching | wantWatch | dropped | rating | votes | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13 | Kaguya-sama: Love Is War? | TV | 10.0 | NaN | True | 2020.0 | NaN | Spring | The battle between love and pride continues! N… | [‘A-1 Pictures’] | 0 | NaN | 6368 | 5747 | 96 | 4.617 | 2359.0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
46 | Douluo Dalu 2 | Web | 82.0 | 22min | True | 2018.0 | NaN | NaN | Second season of Douluo Dalu. | [] | 0 | NaN | 1167 | 990 | 32 | 4.540 | 549.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
70 | Fruits Basket 2nd Season | TV | 10.0 | NaN | True | 2020.0 | NaN | Spring | Second season of Fruits Basket. | [‘TMS Entertainment’, ‘8 Pan’] | 0 | NaN | 4160 | 4427 | 55 | 4.527 | 1194.0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
111 | Ascendance of a Bookworm: Part II | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | With her baptism ceremony complete, Myne begin… | [‘Ajia-do’] | 0 | NaN | 3183 | 1916 | 29 | 4.483 | 1139.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
115 | Rakshasa Street 2nd Season | Web | 5.0 | NaN | True | 2019.0 | NaN | NaN | NaN | [] | 0 | NaN | 47 | 102 | 0 | 4.482 | 10.0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
121 | Kingdom 3 | TV | 4.0 | NaN | True | 2020.0 | NaN | Spring | Third season of Kingdom. | [‘Studio Pierrot’, ‘St. Signpost’] | 0 | NaN | 515 | 740 | 14 | 4.476 | 202.0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
239 | One Piece | TV | 929.0 | NaN | True | 1999.0 | NaN | Fall | Long ago the infamous Gol D. Roger was the str… | [‘Toei Animation’] | 0 | NaN | 74537 | 16987 | 12445 | 4.402 | 59737.0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
262 | Tower of God | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | Fame. Glory. Power. Anything in your wildest d… | [‘Telecom Animation Film’] | 1 | NaN | 9568 | 5085 | 187 | 4.391 | 3387.0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
314 | Wu Geng Ji 3rd Season | Web | 21.0 | NaN | True | 2019.0 | NaN | NaN | Third season of Wu Geng Ji. | [] | 0 | NaN | 50 | 140 | 1 | 4.366 | 19.0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
324 | A Certain Scientific Railgun T | TV | 15.0 | NaN | True | 2020.0 | NaN | Winter | Mikoto Misaka and her friends prepare for the … | [‘J.C. Staff’] | 0 | NaN | 1825 | 2939 | 43 | 4.365 | 638.0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
350 | My Next Life as a Villainess: All Routes Lead … | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | Wealthy heiress Katarina Claes is hit in the h… | [‘SILVER LINK’] | 0 | NaN | 5971 | 4107 | 144 | 4.348 | 2126.0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
400 | Ling Jian Zun 3rd Season | Web | 40.0 | 10min | True | 2019.0 | NaN | NaN | Third season of Ling Jian Zun. | [] | 0 | NaN | 56 | 31 | 1 | 4.332 | 18.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
406 | Major 2nd: Second Season | TV | 7.0 | NaN | True | 2020.0 | NaN | Spring | Second season of Major 2nd. | [‘OLM’] | 0 | NaN | 307 | 306 | 8 | 4.329 | 102.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
418 | IDOLiSH7: Second BEAT! | TV | 4.0 | NaN | True | 2020.0 | NaN | Spring | Second season of IDOLiSH7. | [‘TROYCA’] | 0 | NaN | 344 | 764 | 13 | 4.325 | 106.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
546 | Food Wars! The Fifth Plate | TV | 2.0 | NaN | True | 2020.0 | NaN | Spring | NaN | [‘J.C. Staff’] | 0 | NaN | 3354 | 3673 | 34 | 4.276 | 987.0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
597 | Seitokai Yakuindomo* OVA | OVA | 9.0 | NaN | True | 2014.0 | NaN | NaN | NaN | [‘GoHands’] | 1 | NaN | 1866 | 1390 | 79 | 4.252 | 1271.0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
604 | Touhou Gensou Mangekyou: The Memories of Phantasm | Other | 14.0 | 15min | True | 2011.0 | NaN | NaN | Marisa, an ordinary magician, suspects a youka… | [] | 0 | NaN | 729 | 1134 | 71 | 4.248 | 558.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
660 | Yaoshenji 4th Season | Web | 13.0 | NaN | True | 2020.0 | NaN | NaN | Fourth season of Yaoshenji. | [] | 0 | NaN | 106 | 80 | 3 | 4.226 | 42.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
678 | The Millionaire Detective – Balance:Unlimited | TV | 2.0 | NaN | True | 2020.0 | NaN | Spring | Detective Daisuke Kambe has no problems using … | [‘CloverWorks’] | 0 | NaN | 3133 | 2990 | 89 | 4.223 | 1028.0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
767 | Quanzhi Fashi 4th Season | Web | 3.0 | 18min | True | 2020.0 | NaN | NaN | Fourth season of Quanzhi Fashi. | [‘Foch’] | 0 | NaN | 94 | 431 | 2 | 3.795 | 25.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
772 | That Time I Got Reincarnated as a Slime OVA | OVA | 3.0 | 23min | True | 2019.0 | NaN | NaN | In the midst of his everyday life, Rimuru sudd… | [‘8-Bit’] | 0 | NaN | 3859 | 3225 | 70 | 4.188 | 1849.0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
842 | Huyao Xiao Hongniang 2 | Web | 80.0 | NaN | True | 2017.0 | NaN | NaN | Continuation of Huyao Xiao Hongniang (Fox Spir… | [‘Haoliners Animation League’] | 0 | NaN | 217 | 497 | 18 | 4.170 | 87.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
906 | Feng Ling Yu Xiu | Web | 4.0 | NaN | True | 2017.0 | NaN | NaN | NaN | [] | 0 | NaN | 28 | 121 | 5 | 4.151 | 13.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
930 | Sing “Yesterday” for Me | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | Rikuo has graduated from college, but has zero… | [‘Doga Kobo’] | 0 | NaN | 2756 | 3119 | 137 | 4.136 | 827.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
953 | Kakushigoto | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | Single father Kakushi Goto has a secret. He’s … | [‘Ajia-do’] | 0 | NaN | 2735 | 3093 | 164 | 4.143 | 891.0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
994 | Detective Conan | TV | 974.0 | NaN | True | 1996.0 | NaN | Winter | Shinichi Kudo is a famous teenage detective wh… | [‘TMS Entertainment’, ‘V1 Studio’] | 1 | NaN | 14928 | 5035 | 4771 | 4.126 | 14422.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1133 | Pokemon: Twilight Wings | Web | 5.0 | 6min | True | 2020.0 | NaN | Winter | Set in the Galar region, where Pokémon battles… | [‘Studio Colorido’] | 0 | NaN | 735 | 482 | 20 | 4.083 | 243.0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1305 | Ling Long: Incarnation | Web | 6.0 | 30min | True | 2019.0 | NaN | NaN | NaN | [] | 0 | NaN | 39 | 78 | 5 | 4.034 | 22.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1314 | Katarina Nounai Kaigi | Web | 26.0 | 1min | True | 2020.0 | NaN | Winter | NaN | [] | 0 | NaN | 58 | 146 | 2 | 4.032 | 18.0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
1317 | Great Pretender | TV | 10.0 | NaN | True | 2020.0 | NaN | Summer | Makoto Edamura is a con man who is said to be … | [‘Wit Studio’] | 0 | NaN | 231 | 934 | 6 | 4.124 | 78.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1323 | Black Clover | TV | 132.0 | NaN | True | 2017.0 | NaN | Fall | In a world where magic is everything, Asta and… | [‘Studio Pierrot’] | 0 | NaN | 32313 | 12295 | 3424 | 4.031 | 17866.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1473 | Ahiru no Sora | TV | 35.0 | NaN | True | 2019.0 | NaN | Fall | He may be shorter in stature, but Sora Kurumat… | [‘diomedea’] | 0 | NaN | 3067 | 1813 | 302 | 3.989 | 1305.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
1550 | Pokemon Journeys | TV | 24.0 | NaN | True | 2019.0 | NaN | Fall | Pokémon Trainer Ash Ketchum has a new plan: se… | [‘OLM’] | 0 | NaN | 821 | 773 | 68 | 3.962 | 314.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1551 | Arte | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | 16th century Firenze, Italy. One girl, One ART… | [‘Seven Arcs’] | 0 | NaN | 1746 | 1920 | 130 | 3.966 | 578.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
1683 | Wan Jie Shen Zhu | Web | 92.0 | 9min | True | 2019.0 | NaN | NaN | Time traveling from 21st century to the Nanzho… | [] | 0 | NaN | 122 | 89 | 10 | 3.930 | 53.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1839 | Black Clover: Petit Clover Advance | DVD Special | 4.0 | 6min | True | 2018.0 | NaN | NaN | NaN | [] | 0 | NaN | 63 | 105 | 5 | 3.894 | 31.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1903 | Soukyuu no Fafner: The Beyond | OVA | 6.0 | 30min | True | 2019.0 | NaN | NaN | NaN | [‘XEBEC Zwei’] | 0 | NaN | 45 | 498 | 5 | 3.878 | 18.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1958 | Appare-Ranman! | TV | 3.0 | NaN | True | 2020.0 | NaN | Spring | The socially awkward yet genius engineer, Appa… | [‘P.A. Works’] | 0 | NaN | 1111 | 1431 | 65 | 3.864 | 325.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1972 | Hifuu Katsudou Kiroku: The Sealed Esoteric His… | Other | 2.0 | NaN | True | 2015.0 | NaN | NaN | NaN | [] | 0 | NaN | 32 | 129 | 7 | 3.862 | 17.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
2154 | Digimon Adventure Memorial Story Project | Web | 1.0 | 6min | True | 2020.0 | NaN | Winter | Over ten years have passed since that initial … | [‘TYO Animations’] | 0 | NaN | 141 | 323 | 5 | 3.817 | 42.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2171 | Doraemon (2005) | TV | 607.0 | NaN | True | 2005.0 | NaN | Spring | Robotic cat Doraemon is sent back in time from… | [‘Shin-Ei Animation’] | 0 | NaN | 746 | 390 | 222 | 3.815 | 557.0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2263 | Digimon Adventure: | TV | 3.0 | NaN | True | 2020.0 | NaN | Spring | It’s the year 2020. The Network has become som… | [‘Toei Animation’] | 0 | NaN | 961 | 671 | 44 | 3.797 | 306.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2286 | Gleipnir | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | Shuichi Kagaya an ordinary high school kid in … | [‘Pine Jam’] | 1 | NaN | 3821 | 3311 | 270 | 3.792 | 1424.0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2294 | Wave, Listen to Me! | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | The stage is Sapporo, Hokkaido. One night, our… | [‘Sunrise’] | 0 | NaN | 857 | 776 | 115 | 3.785 | 298.0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2359 | Tsugumomo 2 | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | Thanks to his unique condition that causes him… | [‘Zero-G’] | 0 | NaN | 739 | 1112 | 31 | 3.769 | 227.0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2378 | Strike the Blood IV | OVA | 2.0 | 24min | True | 2020.0 | NaN | NaN | NaN | [‘CONNECT’] | 1 | NaN | 394 | 1263 | 10 | 3.800 | 115.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2400 | Diary of Our Days at the Breakwater | TV | 3.0 | NaN | True | 2020.0 | NaN | Spring | Hina Tsurugi is a first-year student who moves… | [‘Doga Kobo’] | 0 | NaN | 532 | 513 | 45 | 3.763 | 182.0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2430 | Gundam Build Divers Re:RISE 2nd Season | Web | 5.0 | 25min | True | 2020.0 | NaN | Spring | Two years have passed since the legendary forc… | [‘Sunrise Beyond’] | 0 | NaN | 143 | 171 | 3 | 3.752 | 49.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2475 | Prince of Tennis: Best Games!! | OVA | 2.0 | NaN | True | 2018.0 | NaN | NaN | The new anime will depict previously untold st… | [‘M.S.C.’] | 0 | NaN | 62 | 325 | 4 | 3.741 | 21.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
2550 | Letters from Hibakusha | TV Special | 10.0 | 5min | True | 2017.0 | NaN | NaN | The program will consist of three anime shorts… | [] | 0 | NaN | 16 | 63 | 1 | 3.722 | 10.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
2865 | Sing “Yesterday” for Me Extra | Web | 4.0 | 2min | True | 2020.0 | NaN | Spring | NaN | [‘Doga Kobo’] | 0 | NaN | 198 | 237 | 1 | 3.638 | 61.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3041 | Crayon Shin-chan | TV | 1034.0 | NaN | True | 1992.0 | NaN | NaN | Shinnosuke Nohara is a crude and rude five-yea… | [‘Shin-Ei Animation’] | 1 | NaN | 5103 | 1363 | 2770 | 3.611 | 5600.0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3063 | A3! Season Spring & Summer | TV | 10.0 | NaN | True | 2020.0 | NaN | Winter | Mankai Company is a far cry from its glory day… | [‘Studio 3Hz’, ‘P.A. Works’] | 0 | NaN | 470 | 962 | 74 | 3.571 | 175.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3091 | Ojarumaru | TV | 1738.0 | 10min | True | 1998.0 | NaN | Fall | In the Heian era, around 1000 years ago, a you… | [‘Gallop’] | 0 | NaN | 25 | 24 | 4 | 3.600 | 11.0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3174 | Nintama Rantarou | TV | 1888.0 | 10min | True | 1993.0 | NaN | NaN | Rantarou, Shinbei and Kirimaru are ninja appre… | [‘Ajia-do’] | 0 | NaN | 92 | 135 | 43 | 3.582 | 64.0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3480 | Beyblade: Burst Super King | TV | 8.0 | NaN | True | 2020.0 | NaN | Spring | NaN | [‘OLM’] | 0 | NaN | 57 | 103 | 1 | 3.515 | 17.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3520 | Ryoutei no Aji | Other | 8.0 | 2min | True | 2014.0 | NaN | NaN | NaN | [‘Studio Colorido’] | 0 | NaN | 19 | 27 | 3 | 3.507 | 24.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
3529 | Boruto: Naruto Next Generations | TV | 154.0 | NaN | True | 2017.0 | NaN | Spring | The life of the shinobi is beginning to change… | [‘Studio Pierrot’] | 0 | NaN | 22446 | 8480 | 2889 | 3.504 | 11722.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3594 | Argonavis from BanG Dream! Animation | TV | 10.0 | NaN | True | 2020.0 | NaN | Spring | NaN | [‘SANZIGEN’] | 0 | NaN | 137 | 324 | 12 | 3.491 | 47.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
3736 | Healin’ Good Pretty Cure | TV | 12.0 | NaN | True | 2020.0 | NaN | Winter | The Healing Garden, a secret world that treats… | [‘Toei Animation’] | 0 | NaN | 126 | 190 | 8 | 3.488 | 46.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3848 | BanG Dream! Girls Band Party! Pico: Ohmori | Web | 6.0 | 3min | True | 2020.0 | NaN | Spring | NaN | [‘W-Toon Studio’, ‘SANZIGEN’] | 0 | NaN | 55 | 99 | 0 | 3.433 | 11.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3864 | Chibi Maruko-chan (1995) | TV | 1208.0 | NaN | True | 1995.0 | NaN | Winter | NaN | [‘Nippon Animation’] | 0 | NaN | 82 | 86 | 26 | 3.430 | 50.0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3936 | Disney Tsum Tsum | Web | 37.0 | 2min | True | 2014.0 | NaN | NaN | NaN | [‘Polygon Pictures’] | 0 | NaN | 80 | 172 | 26 | 3.414 | 63.0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
3938 | The 8th Son? Are You Kidding Me? | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | Shingo Ichinomiya, a 25-year-old man working a… | [‘Synergy SP’, ‘Shin-Ei Animation’] | 0 | NaN | 4514 | 2895 | 267 | 3.428 | 1670.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4000 | Queen’s Blade: Unlimited | OVA | 2.0 | NaN | True | 2018.0 | NaN | NaN | NaN | [‘Fortes’] | 0 | NaN | 121 | 793 | 18 | 3.376 | 51.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4022 | Move to the Future | Other | 2.0 | 2min | True | 2019.0 | NaN | NaN | The animated short tells the heartwarming stor… | [‘Signal.MD’] | 0 | NaN | 18 | 28 | 1 | 3.394 | 11.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
4055 | Aikatsu on Parade! (2020) | Web | 4.0 | 12min | True | 2020.0 | NaN | Spring | NaN | [‘BN Pictures’] | 0 | NaN | 51 | 123 | 3 | 3.388 | 17.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4061 | Touhou Niji Sousaku Doujin Anime: Musou Kakyou | OVA | 4.0 | 23min | True | 2008.0 | NaN | NaN | NaN | [] | 0 | NaN | 1074 | 1045 | 91 | 3.387 | 934.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4066 | Gudetama | TV | 951.0 | 1min | True | 2014.0 | NaN | Spring | Gudetama, an egg that is dead to the world and… | [‘Gathering’] | 0 | NaN | 193 | 411 | 65 | 3.385 | 129.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4125 | Tenchi Muyo! Ryo-Ohki 5 | OVA | 2.0 | NaN | True | 2020.0 | NaN | NaN | NaN | [‘AIC’] | 0 | NaN | 76 | 232 | 3 | 3.371 | 29.0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4351 | Komatta Jii-san | TV | 10.0 | 1min | True | 2020.0 | NaN | Spring | An old man who pulls stereotypical ikemen (han… | [‘Kachidoki Studio’] | 0 | NaN | 89 | 133 | 8 | 3.337 | 31.0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4588 | Princess Connect! Re: Dive | TV | 10.0 | NaN | True | 2020.0 | NaN | Spring | In the beautiful land of Astraea where a gentl… | [‘CygamesPictures’] | 0 | NaN | 1804 | 1386 | 162 | 3.274 | 594.0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4797 | Plunderer | TV | 22.0 | NaN | True | 2020.0 | NaN | Winter | Every human inhabiting the world of Alcia is b… | [‘Geek Toys’] | 1 | NaN | 5312 | 3960 | 871 | 3.228 | 2507.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4884 | Zoids Wild Zero | TV | 31.0 | NaN | True | 2019.0 | NaN | Fall | NaN | [‘OLM’] | 0 | NaN | 52 | 351 | 8 | 3.211 | 17.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5255 | 6HP – Six Hearts Princess | TV Special | 7.0 | NaN | True | 2016.0 | NaN | NaN | Haruka Hani is a second-year junior high schoo… | [‘Poncotan’] | 0 | NaN | 50 | 584 | 14 | 3.132 | 27.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5261 | Extra Olympia Kyklos | TV | 4.0 | 5min | True | 2020.0 | NaN | Spring | Demetrios was a young man in Ancient Greece wh… | [‘Gosay Studio’] | 0 | NaN | 184 | 171 | 23 | 3.134 | 70.0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
5438 | Dorohedoro: Ma no Omake | DVD Special | 1.0 | 5min | True | 2020.0 | NaN | NaN | NaN | [‘Mappa’] | 0 | NaN | 43 | 277 | 1 | 3.094 | 13.0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
5529 | Sore Ike! Anpanman | TV | 1527.0 | NaN | True | 1988.0 | NaN | NaN | One night, a Star of Life falls down the chimn… | [‘TMS Entertainment’] | 0 | NaN | 36 | 49 | 21 | 3.076 | 32.0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
5636 | Bonobono (2016) | TV | 212.0 | 6min | True | 2016.0 | NaN | Spring | Bonobono, a young sea otter, bonds with Chipmu… | [‘Eiken’] | 0 | NaN | 69 | 83 | 37 | 3.053 | 46.0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5824 | Woodpecker Detective’s Office | TV | 9.0 | NaN | True | 2020.0 | NaN | Spring | It is the end of the Meiji Era. The genius poe… | [‘LIDEN FILMS’] | 1 | NaN | 530 | 917 | 93 | 3.014 | 171.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
6374 | Hana Kappa | TV | 390.0 | 10min | True | 2010.0 | NaN | Spring | The life of a Kappa is never dull, especially … | [‘XEBEC’, ‘OLM’, ‘Group TAC’] | 0 | NaN | 41 | 52 | 11 | 2.899 | 15.0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6422 | Symphogear XV Specials | DVD Special | 3.0 | 7min | True | 2019.0 | NaN | NaN | NaN | [] | 0 | NaN | 18 | 28 | 1 | 2.890 | 16.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6726 | Super Shiro | TV | 32.0 | 6min | True | 2019.0 | NaN | Fall | The Nohara family dog Shiro becomes a superher… | [‘Science SARU’] | 0 | NaN | 44 | 84 | 13 | 2.825 | 24.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6999 | Shironeko Project: Zero Chronicle | TV | 10.0 | NaN | True | 2020.0 | NaN | Spring | There are two kingdoms in this world – the Kin… | [‘Project No. 9’] | 1 | NaN | 1078 | 1067 | 129 | 2.783 | 386.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7143 | Sazae-san | TV | 2527.0 | NaN | True | 1969.0 | NaN | NaN | Sazae Fuguta, married to Masuo and mother of T… | [‘Eiken’] | 0 | NaN | 233 | 216 | 76 | 2.737 | 139.0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7175 | Super Dragon Ball Heroes | Web | 22.0 | 9min | True | 2018.0 | NaN | Summer | NaN | [‘Toei Animation’] | 0 | NaN | 2061 | 1233 | 343 | 2.731 | 1193.0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
7403 | Zenonzard The Animation | Web | 4.0 | 15min | True | 2020.0 | NaN | Winter | NaN | [‘8-Bit’] | 0 | NaN | 142 | 292 | 15 | 2.682 | 56.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7617 | Puzzle & Dragons | TV | 104.0 | NaN | True | 2018.0 | NaN | Spring | Taiga Akaishi is a passionate boy who is aimin… | [‘Studio Pierrot’] | 0 | NaN | 35 | 136 | 9 | 2.636 | 13.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7876 | Chokotto Anime Kemono Friends 3 | Web | 17.0 | 1min | True | 2019.0 | NaN | NaN | NaN | [] | 0 | NaN | 28 | 65 | 4 | 2.579 | 13.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
7898 | Jakusansei Million Arthur | Web | 83.0 | 2min | True | 2015.0 | NaN | Fall | NaN | [‘Gathering’] | 0 | NaN | 50 | 193 | 7 | 2.574 | 33.0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8103 | Listeners | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | In a world where the entire idea of music vani… | [‘Mappa’] | 0 | NaN | 767 | 934 | 262 | 2.527 | 316.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8347 | TAMAYOMI: The Baseball Girls | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | Even with her “miracle ball,” junior high stud… | [‘Studio A-Cat’] | 0 | NaN | 499 | 653 | 141 | 2.450 | 194.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
8350 | SD Gundam World: Sangoku Souketsuden | Web | 7.0 | NaN | True | 2019.0 | NaN | Summer | A mysterious virus called the “Yellow Zombie V… | [‘Sunrise’] | 0 | NaN | 60 | 117 | 15 | 2.474 | 27.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8610 | Yu-Gi-Oh! Sevens | TV | 6.0 | NaN | True | 2020.0 | NaN | Spring | In the future in the town of Gouha, Yuga Ohdo … | [‘Bridge’] | 0 | NaN | 125 | 222 | 20 | 2.414 | 54.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9283 | Gal & Dino | TV | 7.0 | NaN | True | 2020.0 | NaN | Spring | After a night of drinking, Kaede wakes up real… | [‘Kamikaze Douga’] | 0 | NaN | 284 | 493 | 161 | 2.228 | 158.0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9305 | Asatir: Mirai no Mukashibanashi | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | In 2050 Riyadh, Asma transports her grandchild… | [‘Toei Animation’] | 1 | NaN | 67 | 152 | 38 | 2.249 | 34.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9315 | OBSOLETE | Web | 6.0 | 12min | True | 2019.0 | NaN | Fall | In 2014, aliens suddenly appear on earth and p… | [‘Buemon’] | 0 | NaN | 118 | 454 | 28 | 2.246 | 95.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9336 | Donbei x Kemurikusa | Web | 1.0 | 1min | True | 2019.0 | NaN | NaN | NaN | [] | 0 | NaN | 17 | 26 | 2 | 2.241 | 12.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
9367 | Mewkledreamy | TV | 7.0 | NaN | True | 2020.0 | NaN | Spring | A middle school girl named Yume sees something… | [‘J.C. Staff’] | 0 | NaN | 78 | 200 | 16 | 2.199 | 30.0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9429 | Deluxe Da yo! Kaishain | Web | 19.0 | 2min | True | 2019.0 | NaN | Summer | The new web series follows Kamoyama, a human, … | [‘DLE’] | 0 | NaN | 22 | 29 | 5 | 2.219 | 12.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
9495 | Bungo and Alchemist: Gears of Judgement | TV | 7.0 | NaN | True | 2020.0 | NaN | Spring | The series follows a group of historic writers… | [‘OLM’] | 1 | NaN | 650 | 1053 | 233 | 2.177 | 246.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9698 | Chicken Ramen x Gudetama | Web | 3.0 | 1min | True | 2019.0 | NaN | NaN | NaN | [] | 0 | NaN | 16 | 30 | 0 | 2.148 | 10.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
9798 | Shadowverse | TV | 10.0 | NaN | True | 2020.0 | NaN | Spring | While attending Tensei Academy, Hiiro Ryugasak… | [‘Zexcs’] | 0 | NaN | 222 | 232 | 69 | 2.122 | 99.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
9884 | Wacky TV Na Na Na: Chase the Kraken Monster! | TV | 9.0 | 3min | True | 2020.0 | NaN | Spring | Third season of Wacky TV Na Na Na. | [‘Studio Crocodile’] | 0 | NaN | 33 | 48 | 2 | 2.102 | 10.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
10099 | Sakura Wars the Animation | TV | 11.0 | NaN | True | 2020.0 | NaN | Spring | Set in 1940, it’s been 10 years since the grea… | [‘SANZIGEN’] | 0 | NaN | 252 | 594 | 125 | 2.044 | 123.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
10210 | Yeastken | Web | 5.0 | 1min | True | 2018.0 | NaN | NaN | NaN | [] | 0 | NaN | 30 | 41 | 3 | 2.013 | 11.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10345 | Shachibato! President, It’s Time for Battle! | TV | 10.0 | NaN | True | 2020.0 | NaN | Spring | In the developed city of Gatepia close to the … | [‘C2C’] | 0 | NaN | 553 | 512 | 172 | 1.979 | 239.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10693 | Rebirth | TV | 16.0 | 4min | True | 2020.0 | NaN | Winter | NaN | [‘LIDEN FILMS Osaka Studio’] | 0 | NaN | 61 | 111 | 20 | 1.868 | 30.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10972 | The House Spirit Tatami-chan | Web | 10.0 | 4min | True | 2020.0 | NaN | Spring | Tatami-chan is a sardonic ghost from Iwate Pre… | [‘Zero-G’] | 0 | NaN | 162 | 178 | 59 | 1.761 | 82.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11346 | Yodel no Onna | Web | 4.0 | 1min | True | 2017.0 | NaN | NaN | NaN | [‘DLE’] | 0 | NaN | 29 | 26 | 13 | 1.645 | 32.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11377 | Hatachi no Ryouma with Kurofune-kun! | Web | 5.0 | 2min | True | 2018.0 | NaN | NaN | NaN | [‘DLE’] | 0 | NaN | 14 | 10 | 3 | 1.632 | 14.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
11630 | GJ8 Man | Web | 40.0 | 6min | True | 2016.0 | NaN | Fall | Joe Gorou lives a carefree life in the small h… | [] | 0 | NaN | 21 | 41 | 6 | 1.518 | 12.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
12093 | Knyacki! | TV | 48.0 | 5min | True | 1995.0 | NaN | Spring | NaN | [] | 0 | NaN | 10 | 20 | 3 | 2.562 | 10.0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
12094 | Da Li Si Ri Zhi | Web | 7.0 | 17min | True | 2020.0 | NaN | NaN | NaN | [] | 0 | NaN | 19 | 64 | 2 | 3.656 | 10.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
12099 | Xing Chen Bian 2nd Season | Web | 3.0 | 24min | True | 2020.0 | NaN | NaN | Second season of Xing Chen Bian. | [] | 0 | NaN | 31 | 22 | 0 | 3.941 | 10.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
In [18]:
# checking the summary of the data with missing values in finishYr df[df.finishYr.isnull()].describe(include="all").T
Out[18]:
count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|---|---|---|
title | 115 | 115 | Kaguya-sama: Love Is War? | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
mediaType | 115 | 6 | TV | 64 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
eps | 115.0 | NaN | NaN | NaN | 136.521739 | 408.981219 | 1.0 | 4.5 | 10.0 | 22.0 | 2527.0 |
duration | 50 | 18 | 1min | 8 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
ongoing | 115 | 1 | True | 115 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
startYr | 115.0 | NaN | NaN | NaN | 2016.521739 | 8.053928 | 1969.0 | 2018.0 | 2020.0 | 2020.0 | 2020.0 |
finishYr | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
sznOfRelease | 75 | 4 | Spring | 50 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
description | 79 | 79 | The battle between love and pride continues! N… | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
studios | 115 | 66 | [] | 23 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
contentWarn | 115.0 | NaN | NaN | NaN | 0.095652 | 0.295401 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
watched | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
watching | 115.0 | NaN | NaN | NaN | 2101.513043 | 7943.927373 | 10.0 | 50.0 | 142.0 | 909.0 | 74537.0 |
wantWatch | 115.0 | NaN | NaN | NaN | 1164.765217 | 2317.386384 | 10.0 | 104.0 | 324.0 | 1060.0 | 16987.0 |
dropped | 115.0 | NaN | NaN | NaN | 285.730435 | 1316.734606 | 0.0 | 5.0 | 20.0 | 84.0 | 12445.0 |
rating | 115.0 | NaN | NaN | NaN | 3.390748 | 0.801144 | 1.518 | 2.76 | 3.515 | 4.0315 | 4.617 |
votes | 115.0 | NaN | NaN | NaN | 1266.808696 | 6019.774866 | 10.0 | 20.0 | 63.0 | 355.5 | 59737.0 |
tag_’Comedy’ | 115.0 | NaN | NaN | NaN | 0.356522 | 0.481068 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
tag_’Based on a Manga’ | 115.0 | NaN | NaN | NaN | 0.330435 | 0.472428 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
tag_’Action’ | 115.0 | NaN | NaN | NaN | 0.295652 | 0.458332 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
tag_’Fantasy’ | 115.0 | NaN | NaN | NaN | 0.330435 | 0.472428 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
tag_’Sci Fi’ | 115.0 | NaN | NaN | NaN | 0.121739 | 0.328415 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Shounen’ | 115.0 | NaN | NaN | NaN | 0.130435 | 0.338255 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Family Friendly’ | 115.0 | NaN | NaN | NaN | 0.104348 | 0.307049 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Original Work’ | 115.0 | NaN | NaN | NaN | 0.095652 | 0.295401 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Non-Human Protagonists’ | 115.0 | NaN | NaN | NaN | 0.147826 | 0.356481 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Adventure’ | 115.0 | NaN | NaN | NaN | 0.113043 | 0.318032 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Short Episodes’ | 115.0 | NaN | NaN | NaN | 0.278261 | 0.450104 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
tag_’Drama’ | 115.0 | NaN | NaN | NaN | 0.13913 | 0.347597 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Shorts’ | 115.0 | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
tag_’Romance’ | 115.0 | NaN | NaN | NaN | 0.052174 | 0.223351 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’School Life’ | 115.0 | NaN | NaN | NaN | 0.052174 | 0.223351 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Slice of Life’ | 115.0 | NaN | NaN | NaN | 0.104348 | 0.307049 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Animal Protagonists’ | 115.0 | NaN | NaN | NaN | 0.06087 | 0.240137 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Seinen’ | 115.0 | NaN | NaN | NaN | 0.104348 | 0.307049 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Supernatural’ | 115.0 | NaN | NaN | NaN | 0.052174 | 0.223351 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Magic’ | 115.0 | NaN | NaN | NaN | 0.095652 | 0.295401 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’CG Animation’ | 115.0 | NaN | NaN | NaN | 0.113043 | 0.318032 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Mecha’ | 115.0 | NaN | NaN | NaN | 0.052174 | 0.223351 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Ecchi’ | 115.0 | NaN | NaN | NaN | 0.069565 | 0.255526 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Based on a Light Novel’ | 115.0 | NaN | NaN | NaN | 0.052174 | 0.223351 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Anthropomorphic’ | 115.0 | NaN | NaN | NaN | 0.069565 | 0.255526 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Superpowers’ | 115.0 | NaN | NaN | NaN | 0.052174 | 0.223351 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Promotional’ | 115.0 | NaN | NaN | NaN | 0.069565 | 0.255526 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Sports’ | 115.0 | NaN | NaN | NaN | 0.043478 | 0.204824 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Historical’ | 115.0 | NaN | NaN | NaN | 0.052174 | 0.223351 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Vocaloid’ | 115.0 | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
tag_Others | 115.0 | NaN | NaN | NaN | 0.034783 | 0.184031 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
finishYr
are still airing, and fill the values with 2020 (the year the data was collected).In [19]:
df["finishYr"].fillna(2020, inplace=True) # checking missing values in rest of the data df.isnull().sum()
Out[19]:
title 0 mediaType 63 eps 0 duration 4636 ongoing 0 startYr 0 finishYr 0 sznOfRelease 8554 description 4468 studios 0 contentWarn 0 watched 115 watching 0 wantWatch 0 dropped 0 rating 0 votes 0 tag_'Comedy' 0 tag_'Based on a Manga' 0 tag_'Action' 0 tag_'Fantasy' 0 tag_'Sci Fi' 0 tag_'Shounen' 0 tag_'Family Friendly' 0 tag_'Original Work' 0 tag_'Non-Human Protagonists' 0 tag_'Adventure' 0 tag_'Short Episodes' 0 tag_'Drama' 0 tag_'Shorts' 0 tag_'Romance' 0 tag_'School Life' 0 tag_'Slice of Life' 0 tag_'Animal Protagonists' 0 tag_'Seinen' 0 tag_'Supernatural' 0 tag_'Magic' 0 tag_'CG Animation' 0 tag_'Mecha' 0 tag_'Ecchi' 0 tag_'Based on a Light Novel' 0 tag_'Anthropomorphic' 0 tag_'Superpowers' 0 tag_'Promotional' 0 tag_'Sports' 0 tag_'Historical' 0 tag_'Vocaloid' 0 tag_Others 0 dtype: int64
startYr
and finishYr
columns have been dealt with.years_running
, which will be calculated as finishYr
minus startYr
.startYr
and finishYr
columns.In [20]:
df["years_running"] = df["finishYr"] - df["startYr"] df.drop(["startYr", "finishYr"], axis=1, inplace=True) df.head()
Out[20]:
title | mediaType | eps | duration | ongoing | sznOfRelease | description | studios | contentWarn | watched | watching | wantWatch | dropped | rating | votes | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | years_running | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Fullmetal Alchemist: Brotherhood | TV | 64.0 | NaN | False | Spring | The foundation of alchemy is based on the law … | [‘Bones’] | 1 | 103707.0 | 14351 | 25810 | 2656 | 4.702 | 86547.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.0 |
1 | your name. | Movie | 1.0 | 1hr 47min | False | NaN | Mitsuha and Taki are two total strangers livin… | [‘CoMix Wave Films’] | 0 | 58831.0 | 1453 | 21733 | 124 | 4.663 | 43960.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
2 | A Silent Voice | Movie | 1.0 | 2hr 10min | False | NaN | After transferring into a new school, a deaf g… | [‘Kyoto Animation’] | 1 | 45892.0 | 946 | 17148 | 132 | 4.661 | 33752.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
3 | Haikyuu!! Karasuno High School vs Shiratorizaw… | TV | 10.0 | NaN | False | Fall | Picking up where the second season ended, the … | [‘Production I.G’] | 0 | 25134.0 | 2183 | 8082 | 167 | 4.660 | 17422.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.0 |
4 | Attack on Titan 3rd Season: Part II | TV | 10.0 | NaN | False | Spring | The battle to retake Wall Maria begins now! Wi… | [‘Wit Studio’] | 1 | 21308.0 | 3217 | 7864 | 174 | 4.650 | 15789.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
duration
column from string to numeric.In [21]:
# we define a function to convert the duration column to numeric def time_to_minutes(var): if isinstance(var, str): # checking if the value is string or not if "hr" in var: # checking for the presence of hours in the duration spl = var.split(" ") # splitting the value by space hr = ( float(spl[0].replace("hr", "")) * 60 ) # taking numeric part and converting hours to minutes mt = float(spl[1].replace("min", "")) # taking numeric part of minutes return hr + mt else: return float(var.replace("min", "")) # taking numeric part of minutes else: return np.nan # will return NaN if value is not string
In [22]:
# let's apply the function to the duration column and overwrite the column df["duration"] = df["duration"].apply(time_to_minutes) df.head()
Out[22]:
title | mediaType | eps | duration | ongoing | sznOfRelease | description | studios | contentWarn | watched | watching | wantWatch | dropped | rating | votes | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | years_running | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Fullmetal Alchemist: Brotherhood | TV | 64.0 | NaN | False | Spring | The foundation of alchemy is based on the law … | [‘Bones’] | 1 | 103707.0 | 14351 | 25810 | 2656 | 4.702 | 86547.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.0 |
1 | your name. | Movie | 1.0 | 107.0 | False | NaN | Mitsuha and Taki are two total strangers livin… | [‘CoMix Wave Films’] | 0 | 58831.0 | 1453 | 21733 | 124 | 4.663 | 43960.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
2 | A Silent Voice | Movie | 1.0 | 130.0 | False | NaN | After transferring into a new school, a deaf g… | [‘Kyoto Animation’] | 1 | 45892.0 | 946 | 17148 | 132 | 4.661 | 33752.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
3 | Haikyuu!! Karasuno High School vs Shiratorizaw… | TV | 10.0 | NaN | False | Fall | Picking up where the second season ended, the … | [‘Production I.G’] | 0 | 25134.0 | 2183 | 8082 | 167 | 4.660 | 17422.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.0 |
4 | Attack on Titan 3rd Season: Part II | TV | 10.0 | NaN | False | Spring | The battle to retake Wall Maria begins now! Wi… | [‘Wit Studio’] | 1 | 21308.0 | 3217 | 7864 | 174 | 4.650 | 15789.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
In [23]:
# let's check the summary of the duration column df["duration"].describe()
Out[23]:
count 7465.000000 mean 24.230141 std 31.468171 min 1.000000 25% 4.000000 50% 8.000000 75% 30.000000 max 163.000000 Name: duration, dtype: float64
sznOfRelease
column with ‘is_missing‘, which will act as a new category.In [24]:
df["sznOfRelease"].fillna("is_missing", inplace=True) df.isnull().sum()
Out[24]:
title 0 mediaType 63 eps 0 duration 4636 ongoing 0 sznOfRelease 0 description 4468 studios 0 contentWarn 0 watched 115 watching 0 wantWatch 0 dropped 0 rating 0 votes 0 tag_'Comedy' 0 tag_'Based on a Manga' 0 tag_'Action' 0 tag_'Fantasy' 0 tag_'Sci Fi' 0 tag_'Shounen' 0 tag_'Family Friendly' 0 tag_'Original Work' 0 tag_'Non-Human Protagonists' 0 tag_'Adventure' 0 tag_'Short Episodes' 0 tag_'Drama' 0 tag_'Shorts' 0 tag_'Romance' 0 tag_'School Life' 0 tag_'Slice of Life' 0 tag_'Animal Protagonists' 0 tag_'Seinen' 0 tag_'Supernatural' 0 tag_'Magic' 0 tag_'CG Animation' 0 tag_'Mecha' 0 tag_'Ecchi' 0 tag_'Based on a Light Novel' 0 tag_'Anthropomorphic' 0 tag_'Superpowers' 0 tag_'Promotional' 0 tag_'Sports' 0 tag_'Historical' 0 tag_'Vocaloid' 0 tag_Others 0 years_running 0 dtype: int64
Let’s check the number of unique values and the number of times they occur for the mediaType
column.
In [25]:
df.mediaType.value_counts()
Out[25]:
TV 3993 Movie 1928 OVA 1770 Music Video 1290 Web 1170 DVD Special 803 Other 580 TV Special 504 Name: mediaType, dtype: int64
mediaType
column with ‘Other‘, as the exact values for that category are not known.In [26]:
df.mediaType.fillna("Other", inplace=True) # checking the number of unique values and the number of times they occur df.mediaType.value_counts()
Out[26]:
TV 3993 Movie 1928 OVA 1770 Music Video 1290 Web 1170 DVD Special 803 Other 643 TV Special 504 Name: mediaType, dtype: int64
studios
column has a list of values.In [27]:
df["studios"] = df["studios"].str.lstrip("[").str.rstrip("]") df["studios"] = df["studios"].replace( "", np.nan ) # mark as NaN if the value is a blank string df.head()
Out[27]:
title | mediaType | eps | duration | ongoing | sznOfRelease | description | studios | contentWarn | watched | watching | wantWatch | dropped | rating | votes | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | years_running | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Fullmetal Alchemist: Brotherhood | TV | 64.0 | NaN | False | Spring | The foundation of alchemy is based on the law … | ‘Bones’ | 1 | 103707.0 | 14351 | 25810 | 2656 | 4.702 | 86547.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.0 |
1 | your name. | Movie | 1.0 | 107.0 | False | is_missing | Mitsuha and Taki are two total strangers livin… | ‘CoMix Wave Films’ | 0 | 58831.0 | 1453 | 21733 | 124 | 4.663 | 43960.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
2 | A Silent Voice | Movie | 1.0 | 130.0 | False | is_missing | After transferring into a new school, a deaf g… | ‘Kyoto Animation’ | 1 | 45892.0 | 946 | 17148 | 132 | 4.661 | 33752.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
3 | Haikyuu!! Karasuno High School vs Shiratorizaw… | TV | 10.0 | NaN | False | Fall | Picking up where the second season ended, the … | ‘Production I.G’ | 0 | 25134.0 | 2183 | 8082 | 167 | 4.660 | 17422.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.0 |
4 | Attack on Titan 3rd Season: Part II | TV | 10.0 | NaN | False | Spring | The battle to retake Wall Maria begins now! Wi… | ‘Wit Studio’ | 1 | 21308.0 | 3217 | 7864 | 174 | 4.650 | 15789.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
In [28]:
# checking missing values in rest of the data df.isnull().sum()
Out[28]:
title 0 mediaType 0 eps 0 duration 4636 ongoing 0 sznOfRelease 0 description 4468 studios 3208 contentWarn 0 watched 115 watching 0 wantWatch 0 dropped 0 rating 0 votes 0 tag_'Comedy' 0 tag_'Based on a Manga' 0 tag_'Action' 0 tag_'Fantasy' 0 tag_'Sci Fi' 0 tag_'Shounen' 0 tag_'Family Friendly' 0 tag_'Original Work' 0 tag_'Non-Human Protagonists' 0 tag_'Adventure' 0 tag_'Short Episodes' 0 tag_'Drama' 0 tag_'Shorts' 0 tag_'Romance' 0 tag_'School Life' 0 tag_'Slice of Life' 0 tag_'Animal Protagonists' 0 tag_'Seinen' 0 tag_'Supernatural' 0 tag_'Magic' 0 tag_'CG Animation' 0 tag_'Mecha' 0 tag_'Ecchi' 0 tag_'Based on a Light Novel' 0 tag_'Anthropomorphic' 0 tag_'Superpowers' 0 tag_'Promotional' 0 tag_'Sports' 0 tag_'Historical' 0 tag_'Vocaloid' 0 tag_Others 0 years_running 0 dtype: int64
studios
columnIn [29]:
df.sample( 10, random_state=2 ) # setting the random_state will ensure we get the same results every time
Out[29]:
title | mediaType | eps | duration | ongoing | sznOfRelease | description | studios | contentWarn | watched | watching | wantWatch | dropped | rating | votes | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | years_running | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7002 | Tales of the Rays: Mirrage Prison | Web | 1.0 | 1.0 | False | is_missing | NaN | NaN | 0 | 81.0 | 3 | 80 | 1 | 2.767 | 38.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0.0 |
11871 | Onikiri Shoujo | Web | 1.0 | 1.0 | False | is_missing | NaN | NaN | 0 | 44.0 | 0 | 25 | 1 | 1.359 | 25.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
7492 | Triage X | TV | 10.0 | NaN | False | Spring | Mochizuki General Hospital boasts some of the … | ‘XEBEC’ | 1 | 4129.0 | 871 | 2867 | 788 | 2.665 | 3485.0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
3852 | Rainbow Days OVA | OVA | 1.0 | NaN | False | is_missing | NaN | ‘Ashi Productions’ | 0 | 580.0 | 24 | 987 | 8 | 3.432 | 291.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
4506 | Endro~! | TV | 12.0 | NaN | False | Winter | In the land of Naral Island, a land of magic a… | ‘Studio Gokumi’ | 0 | 1033.0 | 372 | 1205 | 254 | 3.290 | 976.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
9863 | Heybot! | TV | 50.0 | NaN | False | Summer | The story takes place on Screw Island, a screw… | ‘BN Pictures’ | 0 | 33.0 | 14 | 62 | 36 | 2.107 | 54.0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.0 |
3513 | Unico | Movie | 1.0 | 90.0 | False | is_missing | Unico is a special unicorn with the ability to… | ‘MADHOUSE’ | 0 | 748.0 | 7 | 371 | 17 | 3.508 | 459.0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
10605 | Ali Baba to 40-hiki no Touzoku | Movie | 1.0 | 56.0 | False | is_missing | Generations ago, the wily Ali Baba stole a cav… | ‘Toei Animation’ | 0 | 305.0 | 7 | 99 | 12 | 1.897 | 147.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
10270 | rerulili: Girls Talk | Music Video | 1.0 | 4.0 | False | is_missing | NaN | NaN | 0 | 18.0 | 0 | 5 | 0 | 1.995 | 13.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.0 |
2942 | Majestic Prince Movie: Kakusei no Idenshiko | Movie | 1.0 | NaN | False | is_missing | NaN | ‘Seven Arcs Pictures’, ‘Orange’ | 0 | 261.0 | 11 | 423 | 8 | 3.634 | 168.0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
In [30]:
studio_df = pd.DataFrame( df.studios.str.split(", ", expand=True).values.flatten(), columns=["Studios"] ) val_c = studio_df.Studios.value_counts() val_c
Out[30]:
'Toei Animation' 636 'Sunrise' 433 'J.C. Staff' 341 'MADHOUSE' 339 'TMS Entertainment' 319 ... 'Studio Giants' 1 'BigFireBird' 1 'Pandanium' 1 'MMT Technology' 1 'Office Nobu' 1 Name: Studios, Length: 488, dtype: int64
In [31]:
# we take 100 as threshold threshold = 100 val_c[val_c.values >= threshold]
Out[31]:
'Toei Animation' 636 'Sunrise' 433 'J.C. Staff' 341 'MADHOUSE' 339 'TMS Entertainment' 319 'Production I.G' 279 'Studio Deen' 266 'Studio Pierrot' 223 'OLM' 216 'A-1 Pictures' 194 'AIC' 167 'Shin-Ei Animation' 165 'Tatsunoko Production' 146 'Nippon Animation' 145 'XEBEC' 143 'DLE' 134 'GONZO' 132 'Bones' 122 'Shaft' 119 'Kyoto Animation' 108 Name: Studios, dtype: int64
In [32]:
# list of studios studios_list = val_c[val_c.values >= threshold].index.tolist() print("Studio names taken into consideration:", len(studios_list), studios_list)
Studio names taken into consideration: 20 ["'Toei Animation'", "'Sunrise'", "'J.C. Staff'", "'MADHOUSE'", "'TMS Entertainment'", "'Production I.G'", "'Studio Deen'", "'Studio Pierrot'", "'OLM'", "'A-1 Pictures'", "'AIC'", "'Shin-Ei Animation'", "'Tatsunoko Production'", "'Nippon Animation'", "'XEBEC'", "'DLE'", "'GONZO'", "'Bones'", "'Shaft'", "'Kyoto Animation'"]
In [33]:
# let us create a copy of our dataframe df1 = df.copy()
In [34]:
# first we will fill missing values in the columns by 'Others' df1.studios.fillna("'Others'", inplace=True) df1.studios.isnull().sum()
Out[34]:
0
In [35]:
studio_val = [] for i in range(df1.shape[0]): # iterate over all rows in data txt = df1.studios.values[i] # getting the values in studios column flag = 0 # flag variable for item in studios_list: # iterate over the list of studios considered if item in txt and flag == 0: # checking if studio name is in the row studio_val.append(item) flag = 1 if flag == 0: # if the row values is different from the list of studios considered studio_val.append("'Others'") # we will strip the leading and trailing ', and assign the values to a column df1["studio_primary"] = [item.strip("'") for item in studio_val] df1.tail()
Out[35]:
title | mediaType | eps | duration | ongoing | sznOfRelease | description | studios | contentWarn | watched | watching | wantWatch | dropped | rating | votes | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | years_running | studio_primary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12096 | Sore Ike! Anpanman: Kirameke! Ice no Kuni no V… | Movie | 1.0 | NaN | False | is_missing | Princess Vanilla is a princess in a land of ic… | ‘TMS Entertainment’ | 0 | 22.0 | 1 | 29 | 1 | 2.807 | 10.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | TMS Entertainment |
12097 | Hulaing Babies Petit | TV | 12.0 | 5.0 | False | Winter | NaN | ‘Fukushima Gaina’ | 0 | 13.0 | 10 | 77 | 2 | 2.090 | 10.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | Others |
12098 | Marco & The Galaxy Dragon | OVA | 1.0 | NaN | False | is_missing | NaN | ‘Others’ | 0 | 17.0 | 0 | 65 | 0 | 2.543 | 10.0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | Others |
12099 | Xing Chen Bian 2nd Season | Web | 3.0 | 24.0 | True | is_missing | Second season of Xing Chen Bian. | ‘Others’ | 0 | NaN | 31 | 22 | 0 | 3.941 | 10.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | Others |
12100 | Ultra B: Black Hole kara no Dokusaisha BB!! | Movie | 1.0 | 20.0 | False | is_missing | NaN | ‘Shin-Ei Animation’ | 0 | 15.0 | 1 | 19 | 1 | 2.925 | 10.0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.0 | Shin-Ei Animation |
In [36]:
# we will create a list defining whether there is a collaboration between studios # we will check if the second split has None values, which will mean no collaboration between studios studio_val2 = [ 0 if item is None else 1 for item in df1.studios.str.split(", ", expand=True).iloc[:, 1] ] df1["studios_colab"] = studio_val2 df1.tail()
Out[36]:
title | mediaType | eps | duration | ongoing | sznOfRelease | description | studios | contentWarn | watched | watching | wantWatch | dropped | rating | votes | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | years_running | studio_primary | studios_colab | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12096 | Sore Ike! Anpanman: Kirameke! Ice no Kuni no V… | Movie | 1.0 | NaN | False | is_missing | Princess Vanilla is a princess in a land of ic… | ‘TMS Entertainment’ | 0 | 22.0 | 1 | 29 | 1 | 2.807 | 10.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | TMS Entertainment | 0 |
12097 | Hulaing Babies Petit | TV | 12.0 | 5.0 | False | Winter | NaN | ‘Fukushima Gaina’ | 0 | 13.0 | 10 | 77 | 2 | 2.090 | 10.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | Others | 0 |
12098 | Marco & The Galaxy Dragon | OVA | 1.0 | NaN | False | is_missing | NaN | ‘Others’ | 0 | 17.0 | 0 | 65 | 0 | 2.543 | 10.0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | Others | 0 |
12099 | Xing Chen Bian 2nd Season | Web | 3.0 | 24.0 | True | is_missing | Second season of Xing Chen Bian. | ‘Others’ | 0 | NaN | 31 | 22 | 0 | 3.941 | 10.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | Others | 0 |
12100 | Ultra B: Black Hole kara no Dokusaisha BB!! | Movie | 1.0 | 20.0 | False | is_missing | NaN | ‘Shin-Ei Animation’ | 0 | 15.0 | 1 | 19 | 1 | 2.925 | 10.0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.0 | Shin-Ei Animation | 0 |
We will now drop the studios
column.
In [37]:
df1.drop("studios", axis=1, inplace=True) # let's check the data once df1.head()
Out[37]:
title | mediaType | eps | duration | ongoing | sznOfRelease | description | contentWarn | watched | watching | wantWatch | dropped | rating | votes | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | years_running | studio_primary | studios_colab | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Fullmetal Alchemist: Brotherhood | TV | 64.0 | NaN | False | Spring | The foundation of alchemy is based on the law … | 1 | 103707.0 | 14351 | 25810 | 2656 | 4.702 | 86547.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.0 | Bones | 0 |
1 | your name. | Movie | 1.0 | 107.0 | False | is_missing | Mitsuha and Taki are two total strangers livin… | 0 | 58831.0 | 1453 | 21733 | 124 | 4.663 | 43960.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | Others | 0 |
2 | A Silent Voice | Movie | 1.0 | 130.0 | False | is_missing | After transferring into a new school, a deaf g… | 1 | 45892.0 | 946 | 17148 | 132 | 4.661 | 33752.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | Kyoto Animation | 0 |
3 | Haikyuu!! Karasuno High School vs Shiratorizaw… | TV | 10.0 | NaN | False | Fall | Picking up where the second season ended, the … | 0 | 25134.0 | 2183 | 8082 | 167 | 4.660 | 17422.0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.0 | Production I.G | 0 |
4 | Attack on Titan 3rd Season: Part II | TV | 10.0 | NaN | False | Spring | The battle to retake Wall Maria begins now! Wi… | 1 | 21308.0 | 3217 | 7864 | 174 | 4.650 | 15789.0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | Others | 0 |
The only change is that we have replaced the ‘is_missing‘ category in the studio_primary
columns by ‘Others‘.
In [38]:
# checking missing values in rest of the data df1.isnull().sum()
Out[38]:
title 0 mediaType 0 eps 0 duration 4636 ongoing 0 sznOfRelease 0 description 4468 contentWarn 0 watched 115 watching 0 wantWatch 0 dropped 0 rating 0 votes 0 tag_'Comedy' 0 tag_'Based on a Manga' 0 tag_'Action' 0 tag_'Fantasy' 0 tag_'Sci Fi' 0 tag_'Shounen' 0 tag_'Family Friendly' 0 tag_'Original Work' 0 tag_'Non-Human Protagonists' 0 tag_'Adventure' 0 tag_'Short Episodes' 0 tag_'Drama' 0 tag_'Shorts' 0 tag_'Romance' 0 tag_'School Life' 0 tag_'Slice of Life' 0 tag_'Animal Protagonists' 0 tag_'Seinen' 0 tag_'Supernatural' 0 tag_'Magic' 0 tag_'CG Animation' 0 tag_'Mecha' 0 tag_'Ecchi' 0 tag_'Based on a Light Novel' 0 tag_'Anthropomorphic' 0 tag_'Superpowers' 0 tag_'Promotional' 0 tag_'Sports' 0 tag_'Historical' 0 tag_'Vocaloid' 0 tag_Others 0 years_running 0 studio_primary 0 studios_colab 0 dtype: int64
duration
and watched
columns by the median values grouped by studio_primary
and mediaType
.In [39]:
df2 = df1.copy() df2[["duration", "watched"]] = df2.groupby(["studio_primary", "mediaType"])[ ["duration", "watched"] ].transform(lambda x: x.fillna(x.median())) # checking for missing values df2.isnull().sum()
Out[39]:
title 0 mediaType 0 eps 0 duration 155 ongoing 0 sznOfRelease 0 description 4468 contentWarn 0 watched 0 watching 0 wantWatch 0 dropped 0 rating 0 votes 0 tag_'Comedy' 0 tag_'Based on a Manga' 0 tag_'Action' 0 tag_'Fantasy' 0 tag_'Sci Fi' 0 tag_'Shounen' 0 tag_'Family Friendly' 0 tag_'Original Work' 0 tag_'Non-Human Protagonists' 0 tag_'Adventure' 0 tag_'Short Episodes' 0 tag_'Drama' 0 tag_'Shorts' 0 tag_'Romance' 0 tag_'School Life' 0 tag_'Slice of Life' 0 tag_'Animal Protagonists' 0 tag_'Seinen' 0 tag_'Supernatural' 0 tag_'Magic' 0 tag_'CG Animation' 0 tag_'Mecha' 0 tag_'Ecchi' 0 tag_'Based on a Light Novel' 0 tag_'Anthropomorphic' 0 tag_'Superpowers' 0 tag_'Promotional' 0 tag_'Sports' 0 tag_'Historical' 0 tag_'Vocaloid' 0 tag_Others 0 years_running 0 studio_primary 0 studios_colab 0 dtype: int64
duration
column by column median.In [40]:
df2["duration"].fillna(df2.duration.median(), inplace=True) df2.isnull().sum()
Out[40]:
title 0 mediaType 0 eps 0 duration 0 ongoing 0 sznOfRelease 0 description 4468 contentWarn 0 watched 0 watching 0 wantWatch 0 dropped 0 rating 0 votes 0 tag_'Comedy' 0 tag_'Based on a Manga' 0 tag_'Action' 0 tag_'Fantasy' 0 tag_'Sci Fi' 0 tag_'Shounen' 0 tag_'Family Friendly' 0 tag_'Original Work' 0 tag_'Non-Human Protagonists' 0 tag_'Adventure' 0 tag_'Short Episodes' 0 tag_'Drama' 0 tag_'Shorts' 0 tag_'Romance' 0 tag_'School Life' 0 tag_'Slice of Life' 0 tag_'Animal Protagonists' 0 tag_'Seinen' 0 tag_'Supernatural' 0 tag_'Magic' 0 tag_'CG Animation' 0 tag_'Mecha' 0 tag_'Ecchi' 0 tag_'Based on a Light Novel' 0 tag_'Anthropomorphic' 0 tag_'Superpowers' 0 tag_'Promotional' 0 tag_'Sports' 0 tag_'Historical' 0 tag_'Vocaloid' 0 tag_Others 0 years_running 0 studio_primary 0 studios_colab 0 dtype: int64
description
and title
columns.In [41]:
df2.drop(["description", "title"], axis=1, inplace=True) # let's check the summary of our data df2.describe(include="all").T
Out[41]:
count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|---|---|---|
mediaType | 12101 | 8 | TV | 3993 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
eps | 12101.0 | NaN | NaN | NaN | 13.393356 | 57.925097 | 1.0 | 1.0 | 2.0 | 12.0 | 2527.0 |
duration | 12101.0 | NaN | NaN | NaN | 20.025287 | 27.130296 | 1.0 | 5.0 | 7.0 | 25.0 | 163.0 |
ongoing | 12101 | 2 | False | 11986 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
sznOfRelease | 12101 | 5 | is_missing | 8554 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
contentWarn | 12101.0 | NaN | NaN | NaN | 0.115362 | 0.319472 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
watched | 12101.0 | NaN | NaN | NaN | 2861.241302 | 7724.622443 | 0.0 | 55.0 | 342.0 | 2026.0 | 161567.0 |
watching | 12101.0 | NaN | NaN | NaN | 256.334435 | 1380.840902 | 0.0 | 2.0 | 14.0 | 100.0 | 74537.0 |
wantWatch | 12101.0 | NaN | NaN | NaN | 1203.681431 | 2294.32738 | 0.0 | 49.0 | 296.0 | 1275.0 | 28541.0 |
dropped | 12101.0 | NaN | NaN | NaN | 151.568383 | 493.93171 | 0.0 | 3.0 | 12.0 | 65.0 | 19481.0 |
rating | 12101.0 | NaN | NaN | NaN | 2.949037 | 0.827385 | 0.844 | 2.304 | 2.965 | 3.616 | 4.702 |
votes | 12101.0 | NaN | NaN | NaN | 2088.1247 | 5950.332228 | 10.0 | 34.0 | 219.0 | 1414.0 | 131067.0 |
tag_’Comedy’ | 12101.0 | NaN | NaN | NaN | 0.27287 | 0.445453 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
tag_’Based on a Manga’ | 12101.0 | NaN | NaN | NaN | 0.290802 | 0.454151 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
tag_’Action’ | 12101.0 | NaN | NaN | NaN | 0.231221 | 0.421631 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Fantasy’ | 12101.0 | NaN | NaN | NaN | 0.181555 | 0.385493 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Sci Fi’ | 12101.0 | NaN | NaN | NaN | 0.166267 | 0.372336 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Shounen’ | 12101.0 | NaN | NaN | NaN | 0.144864 | 0.351978 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Family Friendly’ | 12101.0 | NaN | NaN | NaN | 0.097017 | 0.295993 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Original Work’ | 12101.0 | NaN | NaN | NaN | 0.135195 | 0.341946 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Non-Human Protagonists’ | 12101.0 | NaN | NaN | NaN | 0.11247 | 0.315957 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Adventure’ | 12101.0 | NaN | NaN | NaN | 0.103793 | 0.305005 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Short Episodes’ | 12101.0 | NaN | NaN | NaN | 0.096934 | 0.29588 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Drama’ | 12101.0 | NaN | NaN | NaN | 0.106107 | 0.307987 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Shorts’ | 12101.0 | NaN | NaN | NaN | 0.089662 | 0.285709 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Romance’ | 12101.0 | NaN | NaN | NaN | 0.092141 | 0.289237 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’School Life’ | 12101.0 | NaN | NaN | NaN | 0.092306 | 0.28947 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Slice of Life’ | 12101.0 | NaN | NaN | NaN | 0.08082 | 0.272569 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Animal Protagonists’ | 12101.0 | NaN | NaN | NaN | 0.060326 | 0.238099 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Seinen’ | 12101.0 | NaN | NaN | NaN | 0.077101 | 0.266763 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Supernatural’ | 12101.0 | NaN | NaN | NaN | 0.070903 | 0.256674 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Magic’ | 12101.0 | NaN | NaN | NaN | 0.064292 | 0.245283 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’CG Animation’ | 12101.0 | NaN | NaN | NaN | 0.050079 | 0.218116 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Mecha’ | 12101.0 | NaN | NaN | NaN | 0.054541 | 0.227091 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Ecchi’ | 12101.0 | NaN | NaN | NaN | 0.057433 | 0.232678 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Based on a Light Novel’ | 12101.0 | NaN | NaN | NaN | 0.053384 | 0.224807 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Anthropomorphic’ | 12101.0 | NaN | NaN | NaN | 0.037848 | 0.190837 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Superpowers’ | 12101.0 | NaN | NaN | NaN | 0.044624 | 0.206486 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Promotional’ | 12101.0 | NaN | NaN | NaN | 0.036361 | 0.187194 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Sports’ | 12101.0 | NaN | NaN | NaN | 0.038013 | 0.191236 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Historical’ | 12101.0 | NaN | NaN | NaN | 0.033303 | 0.179434 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_’Vocaloid’ | 12101.0 | NaN | NaN | NaN | 0.039336 | 0.1944 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
tag_Others | 12101.0 | NaN | NaN | NaN | 0.074457 | 0.262523 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
years_running | 12101.0 | NaN | NaN | NaN | 0.2832 | 1.152234 | 0.0 | 0.0 | 0.0 | 0.0 | 51.0 |
studio_primary | 12101 | 21 | Others | 7548 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
studios_colab | 12101.0 | NaN | NaN | NaN | 0.051649 | 0.221326 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
In [42]:
# function to plot a boxplot and a histogram along the same scale. def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None): """ Boxplot and histogram combined data: dataframe feature: dataframe column figsize: size of figure (default (12,7)) kde: whether to the show density curve (default False) bins: number of bins for histogram (default None) """ f2, (ax_box2, ax_hist2) = plt.subplots( nrows=2, # Number of rows of the subplot grid= 2 sharex=True, # x-axis will be shared among all subplots gridspec_kw={"height_ratios": (0.25, 0.75)}, figsize=figsize, ) # creating the 2 subplots sns.boxplot( data=data, x=feature, ax=ax_box2, showmeans=True, color="violet" ) # boxplot will be created and a star will indicate the mean value of the column sns.histplot( data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter" ) if bins else sns.histplot( data=data, x=feature, kde=kde, ax=ax_hist2 ) # For histogram ax_hist2.axvline( data[feature].mean(), color="green", linestyle="--" ) # Add mean to the histogram ax_hist2.axvline( data[feature].median(), color="black", linestyle="-" ) # Add median to the histogram
rating
In [43]:
histogram_boxplot(df2, "rating")
eps
In [44]:
histogram_boxplot(df2, "eps", bins=100)
duration
In [45]:
histogram_boxplot(df2, "duration")
watched
In [46]:
histogram_boxplot(df2, "watched", bins=50)
watching
In [47]:
histogram_boxplot(df2, "watching", bins=50)
wantWatch
In [48]:
histogram_boxplot(df2, "wantWatch", bins=50)
dropped
In [49]:
histogram_boxplot(df2, "dropped", bins=50)
votes
In [50]:
histogram_boxplot(df2, "votes", bins=50)
years_running
In [51]:
histogram_boxplot(df2, "years_running")
In [52]:
# function to create labeled barplots def labeled_barplot(data, feature, perc=False, n=None): """ Barplot with percentage at the top data: dataframe feature: dataframe column perc: whether to display percentages instead of count (default is False) n: displays the top n category levels (default is None, i.e., display all levels) """ total = len(data[feature]) # length of the column count = data[feature].nunique() if n is None: plt.figure(figsize=(count + 1, 5)) else: plt.figure(figsize=(n + 1, 5)) plt.xticks(rotation=90, fontsize=15) ax = sns.countplot( data=data, x=feature, palette="Paired", order=data[feature].value_counts().index[:n].sort_values(), ) for p in ax.patches: if perc == True: label = "{:.1f}%".format( 100 * p.get_height() / total ) # percentage of each class of the category else: label = p.get_height() # count of each level of the category x = p.get_x() + p.get_width() / 2 # width of the plot y = p.get_height() # height of the plot ax.annotate( label, (x, y), ha="center", va="center", size=12, xytext=(0, 5), textcoords="offset points", ) # annotate the percentage plt.show() # show the plot
mediaType
In [53]:
labeled_barplot(df2, "mediaType", perc=True)
ongoing
In [54]:
labeled_barplot(df2, "ongoing", perc=True)
sznOfRelease
In [55]:
labeled_barplot(df2, "sznOfRelease", perc=True)
studio_primary
In [56]:
labeled_barplot(df2, "studio_primary", perc=True)
studios_colab
In [57]:
labeled_barplot(df2, "studios_colab", perc=True)
contentWarn
In [58]:
labeled_barplot(df2, "contentWarn", perc=True)
In [59]:
# creating a list of tag columns tag_cols = [item for item in df2.columns if "tag" in item]
In [60]:
# checking the values in tag columns for column in tag_cols: print(df2[column].value_counts()) print("-" * 50)
0 8799 1 3302 Name: tag_'Comedy', dtype: int64 -------------------------------------------------- 0 8582 1 3519 Name: tag_'Based on a Manga', dtype: int64 -------------------------------------------------- 0 9303 1 2798 Name: tag_'Action', dtype: int64 -------------------------------------------------- 0 9904 1 2197 Name: tag_'Fantasy', dtype: int64 -------------------------------------------------- 0 10089 1 2012 Name: tag_'Sci Fi', dtype: int64 -------------------------------------------------- 0 10348 1 1753 Name: tag_'Shounen', dtype: int64 -------------------------------------------------- 0 10927 1 1174 Name: tag_'Family Friendly', dtype: int64 -------------------------------------------------- 0 10465 1 1636 Name: tag_'Original Work', dtype: int64 -------------------------------------------------- 0 10740 1 1361 Name: tag_'Non-Human Protagonists', dtype: int64 -------------------------------------------------- 0 10845 1 1256 Name: tag_'Adventure', dtype: int64 -------------------------------------------------- 0 10928 1 1173 Name: tag_'Short Episodes', dtype: int64 -------------------------------------------------- 0 10817 1 1284 Name: tag_'Drama', dtype: int64 -------------------------------------------------- 0 11016 1 1085 Name: tag_'Shorts', dtype: int64 -------------------------------------------------- 0 10986 1 1115 Name: tag_'Romance', dtype: int64 -------------------------------------------------- 0 10984 1 1117 Name: tag_'School Life', dtype: int64 -------------------------------------------------- 0 11123 1 978 Name: tag_'Slice of Life', dtype: int64 -------------------------------------------------- 0 11371 1 730 Name: tag_'Animal Protagonists', dtype: int64 -------------------------------------------------- 0 11168 1 933 Name: tag_'Seinen', dtype: int64 -------------------------------------------------- 0 11243 1 858 Name: tag_'Supernatural', dtype: int64 -------------------------------------------------- 0 11323 1 778 Name: tag_'Magic', dtype: int64 -------------------------------------------------- 0 11495 1 606 Name: tag_'CG Animation', dtype: int64 -------------------------------------------------- 0 11441 1 660 Name: tag_'Mecha', dtype: int64 -------------------------------------------------- 0 11406 1 695 Name: tag_'Ecchi', dtype: int64 -------------------------------------------------- 0 11455 1 646 Name: tag_'Based on a Light Novel', dtype: int64 -------------------------------------------------- 0 11643 1 458 Name: tag_'Anthropomorphic', dtype: int64 -------------------------------------------------- 0 11561 1 540 Name: tag_'Superpowers', dtype: int64 -------------------------------------------------- 0 11661 1 440 Name: tag_'Promotional', dtype: int64 -------------------------------------------------- 0 11641 1 460 Name: tag_'Sports', dtype: int64 -------------------------------------------------- 0 11698 1 403 Name: tag_'Historical', dtype: int64 -------------------------------------------------- 0 11625 1 476 Name: tag_'Vocaloid', dtype: int64 -------------------------------------------------- 0 11200 1 901 Name: tag_Others, dtype: int64 --------------------------------------------------
We will not consider the tag columns for correlation check as they have only 0 or 1 values.
In [61]:
# creating a list of non-tag columns corr_cols = [item for item in df2.columns if "tag" not in item] print(corr_cols)
['mediaType', 'eps', 'duration', 'ongoing', 'sznOfRelease', 'contentWarn', 'watched', 'watching', 'wantWatch', 'dropped', 'rating', 'votes', 'years_running', 'studio_primary', 'studios_colab']
In [62]:
plt.figure(figsize=(12, 7)) sns.heatmap( df2[corr_cols].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral" ) plt.show()
watched
and wantWatch
columns are highly correlated.watched
and votes
columns are very highly correlated.wantWatch
and votes
columns are highly correlated.rating
with some of the categorical columns in our datamediaType
vs rating
In [63]:
plt.figure(figsize=(10, 5)) sns.boxplot(x="mediaType", y="rating", data=df2) plt.show()
sznOfRelease
vs rating
In [64]:
plt.figure(figsize=(10, 5)) sns.boxplot(x="sznOfRelease", y="rating", data=df2) plt.show()
studio_primary
vs rating
In [65]:
plt.figure(figsize=(15, 5)) sns.boxplot(x="studio_primary", y="rating", data=df2) plt.xticks(rotation=90) plt.show()
Data Description:
rating
) is of float type.title
, description
, mediaType
, studio
, etc. are of object type.ongoing
column is of bool type.Data Cleaning:
title
and description
columns are dropped for modeling as they are highly textual in nature.duration
column was converted from string to numeric by applying the time_to_minutes function.studios
column was processed to convert the list of values into a suitable format for analysis and modeling.rating
were dropped.startYr
were dropped.finishYr
were imputed with 2020.sznOfRelease
were imputed with a new category ‘is_missing’.mediaType
were imputed with a new category ‘Other’.duration
and watched
columns were imputed by the median values grouped by studio_primary
and mediaType
. The remaining missing values in these columns, if any, were imputed by column medians over the entire data.startYr
and finishYr
columns were combined to create a new feature years_running
. The original columns were then dropped.Observations from EDA:
rating
: The anime ratings are close to normally distributed, with a mean rating of ~2.95. The rating increases with an increase in the number of people who have watched or want to watch the anime.eps
: The distribution is heavily right-skewed as there are many anime movies in the data (at least 50%), and they are considered to be of only one episode as per data description. The number of episodes increases as the anime runs for more years.duration
: The distribution is right-skewed with a median anime runtime of less than 10 minutes.years_running
: The distribution is heavily right-skewed, and at least 75% of the anime have run for less than 1 year.watched
: The distribution is heavily right-skewed, and most of the anime have less than 500 viewers. This attribute is highly correlated with the wantWatch
and votes
attributes.watching
: The distribution is heavily right-skewed and highly correlated with the dropped
attribute.wantWatch
: The distribution is heavily right-skewed with a median value of 296 potential watchers.dropped
: The distribution is heavily right-skewed with a drop of 152 viewers on average.votes
: The distribution is heavily right-skewed, and few shows have more than 10000 votes.mediaType
: 33% of the anime are published for TV, 11% as music videos, and 10% as web series. Anime available as web series or music videos have a lower rating in generalongoing
: 1% of the anime in the data are ongoing.sznOfRelease
: The season of release is missing for more than 70% of the anime in the data, and more anime are released in spring and fall compared to summer and winter. Anime ratings have a similar distribution across all the seasons of release.studio_primary
: More than 60% of the anime in the data are produced by studios not listed in the data. Toei Animation is the most common studio among the available studio names. In general, the ratings are low for anime produced by DLE studios and studios other than the ones listed in the data.studios_colab
: Around 95% of the anime in the data do not involve collaboration between studios.contentWarn
: Nearly 90% of the anime in the data do not have an associated content warning.tag_<tag/genre>
: There are 3519 anime that are based on manga, 3302 of the Comedy genre, 2798 of the Action genre, 1115 anime of the Romance genre, and more.In [66]:
# creating a list of non-tag columns dist_cols = [ item for item in df2.select_dtypes(include=np.number).columns if "tag" not in item ] # let's plot a histogram of all non-tag columns plt.figure(figsize=(15, 45)) for i in range(len(dist_cols)): plt.subplot(12, 3, i + 1) plt.hist(df2[dist_cols[i]], bins=50) # sns.histplot(data=df2, x=dist_cols[i], kde=True) # you can comment the previous line and run this one to get distribution curves plt.tight_layout() plt.title(dist_cols[i], fontsize=25) plt.show()
contentWarn
, studios_colab
, and rating
columns to deal with skewness in the data.In [67]:
# creating a copy of the dataframe df3 = df2.copy() # removing contentWarn and studios_colab columns as they have only 0 and 1 values dist_cols.remove("contentWarn") dist_cols.remove("studios_colab") # also dropping the rating column as it is almost normally distributed dist_cols.remove("rating")
In [68]:
# using log transforms on some columns for col in dist_cols: df3[col + "_log"] = np.log(df3[col] + 1) # dropping the original columns df3.drop(dist_cols, axis=1, inplace=True) df3.head()
Out[68]:
mediaType | ongoing | sznOfRelease | contentWarn | rating | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | studio_primary | studios_colab | eps_log | duration_log | watched_log | watching_log | wantWatch_log | dropped_log | votes_log | years_running_log | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | TV | False | Spring | 1 | 4.702 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Bones | 0 | 4.174387 | 1.386294 | 11.549335 | 9.571645 | 10.158556 | 7.884953 | 11.368454 | 0.693147 |
1 | Movie | False | is_missing | 0 | 4.663 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Others | 0 | 0.693147 | 4.682131 | 10.982441 | 7.282074 | 9.986633 | 4.828314 | 10.691058 | 0.000000 |
2 | Movie | False | is_missing | 1 | 4.661 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Kyoto Animation | 0 | 0.693147 | 4.875197 | 10.734068 | 6.853299 | 9.749695 | 4.890349 | 10.426825 | 0.000000 |
3 | TV | False | Fall | 0 | 4.660 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Production I.G | 0 | 2.397895 | 2.564949 | 10.132017 | 7.688913 | 8.997518 | 5.123964 | 9.765546 | 0.000000 |
4 | TV | False | Spring | 1 | 4.650 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Others | 0 | 2.397895 | 1.791759 | 9.966885 | 8.076515 | 8.970178 | 5.164786 | 9.667132 | 0.000000 |
Let’s check for skewness after applying the log transformation.
In [69]:
# creating a list of non-tag columns dist_cols = [ item for item in df3.select_dtypes(include=np.number).columns if "tag" not in item ] # let's plot histogram of all non-tag columns plt.figure(figsize=(15, 45)) for i in range(len(dist_cols)): plt.subplot(12, 3, i + 1) plt.hist(df3[dist_cols[i]], bins=50) # sns.histplot(data=df3, x=dist_cols[i], kde=True) # you can comment the previous line and run this one to get distribution curves plt.tight_layout() plt.title(dist_cols[i], fontsize=25) plt.show()
Let’s check for correlations between the columns (other than the tag columns)
In [70]:
plt.figure(figsize=(12, 7)) sns.heatmap( df3[dist_cols].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral" ) plt.show()
In [71]:
X = df3.drop(["rating"], axis=1) y = df3["rating"]
In [72]:
X = pd.get_dummies( X, columns=X.select_dtypes(include=["object", "category"]).columns.tolist(), drop_first=True, ) X.head()
Out[72]:
ongoing | contentWarn | tag_’Comedy’ | tag_’Based on a Manga’ | tag_’Action’ | tag_’Fantasy’ | tag_’Sci Fi’ | tag_’Shounen’ | tag_’Family Friendly’ | tag_’Original Work’ | tag_’Non-Human Protagonists’ | tag_’Adventure’ | tag_’Short Episodes’ | tag_’Drama’ | tag_’Shorts’ | tag_’Romance’ | tag_’School Life’ | tag_’Slice of Life’ | tag_’Animal Protagonists’ | tag_’Seinen’ | tag_’Supernatural’ | tag_’Magic’ | tag_’CG Animation’ | tag_’Mecha’ | tag_’Ecchi’ | tag_’Based on a Light Novel’ | tag_’Anthropomorphic’ | tag_’Superpowers’ | tag_’Promotional’ | tag_’Sports’ | tag_’Historical’ | tag_’Vocaloid’ | tag_Others | studios_colab | eps_log | duration_log | watched_log | watching_log | wantWatch_log | dropped_log | votes_log | years_running_log | mediaType_Movie | mediaType_Music Video | mediaType_OVA | mediaType_Other | mediaType_TV | mediaType_TV Special | mediaType_Web | sznOfRelease_Spring | sznOfRelease_Summer | sznOfRelease_Winter | sznOfRelease_is_missing | studio_primary_AIC | studio_primary_Bones | studio_primary_DLE | studio_primary_GONZO | studio_primary_J.C. Staff | studio_primary_Kyoto Animation | studio_primary_MADHOUSE | studio_primary_Nippon Animation | studio_primary_OLM | studio_primary_Others | studio_primary_Production I.G | studio_primary_Shaft | studio_primary_Shin-Ei Animation | studio_primary_Studio Deen | studio_primary_Studio Pierrot | studio_primary_Sunrise | studio_primary_TMS Entertainment | studio_primary_Tatsunoko Production | studio_primary_Toei Animation | studio_primary_XEBEC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | False | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4.174387 | 1.386294 | 11.549335 | 9.571645 | 10.158556 | 7.884953 | 11.368454 | 0.693147 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | False | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.693147 | 4.682131 | 10.982441 | 7.282074 | 9.986633 | 4.828314 | 10.691058 | 0.000000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | False | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.693147 | 4.875197 | 10.734068 | 6.853299 | 9.749695 | 4.890349 | 10.426825 | 0.000000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | False | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2.397895 | 2.564949 | 10.132017 | 7.688913 | 8.997518 | 5.123964 | 9.765546 | 0.000000 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | False | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2.397895 | 1.791759 | 9.966885 | 8.076515 | 8.970178 | 5.164786 | 9.667132 | 0.000000 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
In [73]:
from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
In [74]:
print("Number of rows in train data =", x_train.shape[0]) print("Number of rows in test data =", x_test.shape[0])
Number of rows in train data = 8470 Number of rows in test data = 3631
In [75]:
lin_reg_model = LinearRegression() lin_reg_model.fit(x_train, y_train)
Out[75]:
LinearRegression()
In [76]:
# function to compute adjusted R-squared def adj_r2_score(predictors, targets, predictions): r2 = r2_score(targets, predictions) n = predictors.shape[0] k = predictors.shape[1] return 1 - ((1 - r2) * (n - 1) / (n - k - 1)) # function to compute MAPE def mape_score(targets, predictions): return np.mean(np.abs(targets - predictions) / targets) * 100 # function to compute different metrics to check performance of a regression model def model_performance_regression(model, predictors, target): """ Function to compute different metrics to check regression model performance model: regressor predictors: independent variables target: dependent variable """ # predicting using the independent variables pred = model.predict(predictors) r2 = r2_score(target, pred) # to compute R-squared adjr2 = adj_r2_score(predictors, target, pred) # to compute adjusted R-squared rmse = np.sqrt(mean_squared_error(target, pred)) # to compute RMSE mae = mean_absolute_error(target, pred) # to compute MAE mape = mape_score(target, pred) # to compute MAPE # creating a dataframe of metrics df_perf = pd.DataFrame( { "RMSE": rmse, "MAE": mae, "R-squared": r2, "Adj. R-squared": adjr2, "MAPE": mape, }, index=[0], ) return df_perf
In [77]:
# Checking model performance on train set print("Training Performance\n") lin_reg_model_train_perf = model_performance_regression(lin_reg_model, x_train, y_train) lin_reg_model_train_perf
Training Performance
Out[77]:
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 0.456805 | 0.357049 | 0.694278 | 0.691619 | 14.200784 |
In [78]:
# Checking model performance on test set print("Test Performance\n") lin_reg_model_test_perf = model_performance_regression(lin_reg_model, x_test, y_test) lin_reg_model_test_perf
Test Performance
Out[78]:
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 0.476857 | 0.371374 | 0.669603 | 0.662823 | 14.780452 |
Observations