# Imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Load built-in iris dataset
iris = sns.load_dataset("iris")
iris.head()
describe() is a very useful method in Pandas as it generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset distribution, excluding NaN values.
iris.describe()
sns.set()
%matplotlib inline
sns.swarmplot(x="species", y="petal_length", data=iris)
df = pd.read_csv("fatal-police-shootings-data.csv", encoding="windows-1252")
df.head(10)
df.describe()
This plot is known as a Strip plot and pretty ideal for categorical values
sns.stripplot(x="armed", y="age", data=df)
tips = sns.load_dataset("tips")
tips.head(10)
tips.describe()
sns.barplot(x="day", y="total_bill", data=tips)
Seaborn splits Matplotlib parameters into two independent groups: First group sets the aesthetic style of the plot; and second scales various elements of the figure to get easily incorporated into different contexts. Seaborn doesn’t take away any of Matplotlib credits, but rather adds some nice default aesthetics and built-in plots that complement and sometimes replace the complicated Matplotlib code professionals needed to write. Facet plots and Regression plots are an example of that.
sns.set_style("whitegrid")
sns.boxplot(x="day", y="total_bill", data=tips)
sns.set_style("ticks")
sns.boxplot(x="day", y="total_bill", data=tips)
sns.set_style("white")
sns.boxplot(x="day", y="total_bill", data=tips)
sns.set_style("dark")
sns.boxplot(x="day", y="total_bill", data=tips)
sns.set_style("ticks")
sns.boxplot(x="day", y="total_bill", data=tips)
sns.despine()
sns.set_style("ticks")
sns.boxplot(x="day", y="total_bill", data=tips)
sns.despine(left=True)
# This function will help us plot some offset since waves
def sinplot(flip=1):
x = np.linspace(0, 14, 100)
for i in range(1, 7):
plt.plot(x, np.sin(x + i * 0.5) * (7 - i) * flip)
with sns.axes_style("darkgrid"):
plt.subplot(211)
sinplot()
plt.subplot(212)
sinplot(-1)
sns.set()
sns.set_context("paper")
sns.set_style("whitegrid")
sns.boxplot(x="day", y="total_bill", data=tips)
I am pretty sure you must be thinking that this figure/plot in no ways is scaled as it looks similar to our previous plot outputs. So, I shall clarify that right away: Jupyter Notebook scales down large images in the notebook cell output. This is generally done because past a certain size, we get automatic figure scaling. For exploratory analysis, we prefer iterating quickly over a number of different analyses and it’s more useful to have facets that are of similar size; than to have overall figures that are same size in a particular context. When we’re in a situation where we need to have something that’s exactly a certain size overall; ideally we:
With all that being said, if we plot the same figure in an Editor like Anaconda Spyder or JetBrains’ PyCharm or IntelliJ, we shall be able to visualize them in their original size. Hence what needs to be our take-away from scaling segment, is that an addition of a line of code can fetch the size of image as per our requirement and we may experiment accordingly. In practical world, we can also add a dictionary of parameters using rc to have a finer control over the aesthetics. Let me show you an example with the same sinplot function we defined earlier:
sns.set(style="whitegrid", rc={"grid.linewidth": 1.5})
sns.set_context("poster", font_scale=2.5, rc={"lines.linewidth": 5.0})
sinplot()
Though our Notebook didn’t display enlarged (scaled) plot, we may notice how in the backend (in memory) it has created the figure as per our instructions. We have thick lines now in our plot because I set linewidth to 5, font size on axes have thickened because of font_scale. Generally we don't use anything more than that during data analysis although exceptional scenarios may demand few more parameters as per requirement which we will slowly taking care of in our next next article of this series.