The response to the question “How do I make plots in Python?” used to be simple: Matplotlib was the only way. However, Python is now the language of data science, and it offers a lot more options.
In this article, we illustrate how to use each of the four most popular Python plotting libraries—Matplotlib, Seaborn, Plotly, and Bokeh—as well as a couple of promising newcomers: Altair, with its expressive API, and Pygal, with its beautiful SVG performance. We’ll also take a look at pandas’ extremely useful plotting API.
Plotting in Python
Matplotlib
Matplotlib is the oldest and most widely used visualization, graphing, and Python plotting library. It was developed as part of the SciPy Stack, an open-source scientific computing library similar to Matlab, in 2003.
Installing Matplotlib
Pip is the simplest way to install matplotlib. In the terminal, run the following command:
pip install matplotlib
The other option is to do a manual download and install it.
Graphing x and y coordinates
In a diagram, the plot() function is used to draw points (markers).
The plot() function is responsible for drawing a line from point to point by design.
The function accepts parameters for defining diagram points.
The first parameter is an array containing the x-axis points.
The y-axis points are defined by parameter 2, which is an array.
If we want to plot a line from (1, 3) to (8, 10), we must pass two arrays to the plot function: [1, 8] and [3, 10].
Example 1: Plotting a Linear Line
import matplotlib.pyplot as plt import numpy as np x_coords = np.array([5, 12]) y_coords = np.array([7, 14]) plt.plot(x_coords, y_coords) plt.show()
Plotting a line
# importing the required module import matplotlib.pyplot as plt # values in the x axis x_coords = [3,4,5] # corresponding values in the y axis y_coords = [4,6,3] # finally, getting the points plotted plt.plot(x_coords, y_coords) # name the x axis as 'the x axis' plt.xlabel('the x axis') # name the y axis as 'the y axis.' plt.ylabel('the y axis') # giving a title to the graph plt.title('The initial Graph by matplotlib.pyplot !') # display the plot plt.show()
It appears that the code is self-explanatory. The measures were as follows:
As a list, there is the definition of the x-axis and corresponding y-axis values:
- Use the .plot() method to plot them on a canvas.
- Using the .xlabel() and.ylabel() functions, give the x- and y-axes a name.
- Using the .title() method, give your plot a title.
Finally, we use the.show() function to display your story.
Using the same plot to plot two or more lines
import matplotlib.pyplot as plt # first line points first_x = [1,2,3] first_y = [2,4,1] # plotting the first line points plt.plot(first_x, first_y, label = "first line") # second line points second_x = [1,2,3] second_y = [4,1,3] # plotting the second line points plt.plot(second_x, second_y, label = "second line") # x axis name plt.xlabel(' the x axis') # y axis name plt.ylabel('the y axis') # give a title to the graph plt.title(' Plotting Two lines on the same graph!') # legend -key plt.legend() # display the plot plt.show()
On the same graph, we map two lines. We distinguish them by assigning them a name(label) passed as an argument to the .plot() function.
The legend is a small rectangular box that contains details about the type of line and its color. Using the .legend() feature, we can add a legend to our story.
How to customize the Plot
We’ll go through some basic customization that can be applied to almost any story.
import matplotlib.pyplot as plt # values in the x axis x_coords = [3,4,5,6,7,8] # corresponding values in the y axis y_coords = [4,6,3,7,4,8] # finally plotting the points plt.plot(x_coords, y_coords, color='green', linestyle='dashed', linewidth = 3, marker='o', markerfacecolor='r', markersize=12) # setting x and y axis range plt.ylim(3,10) plt.xlim(3,10) # x axis naming plt.xlabel('the x axis') # y axis naming plt.ylabel('the y axis') # specify graph title plt.title('Graph Customizations !') # fdisplayplot plt.show()
As you can see, we’ve made several changes, including
- line-width, line-style, and line-color.
- setting the marker, the color of the marker’s face, and the height of the marker
- overriding the axis ranges on the x and y axes. If overriding is not achieved, the auto-scale function of the pyplot module is used to set the axis range and scale.
Bar Graph
import matplotlib.pyplot as plt # bars left sides' x-coordinates left = [3, 4, 5, 6, 7] # bars' height height = [12, 26, 38, 42, 7] # bars' labels bar_labels = ['first', 'second', 'third', 'fourth', 'fifth'] # bar chart plotting plt.bar(left, height, label = bar_labels, width = 0.8, color = ['red', 'blue']) # the x-axis' naming plt.xlabel(' the x axis') # the y-axis naming plt.ylabel('the y axis') # title of the chart plt.title(' The Bar Chart!') # display the plot plt.show()
To build a bar map, we use the plt.bar() function.
The x-coordinates of the left side of the windows and the heights of the bars are transferred.
By defining tick labels, you can also give x-axis coordinates a name.
Hexagonal histogram
import matplotlib.pyplot as plt # frequencies ages = [7,10,75,45,35,50,55,50,48,45,49, 65,12,18,62,23,95,82,37,26,25,45] # set no. of intervals and the ranges range = (0, 100) bins = 10 #histogram plotting plt.hist(ages, bins, range, color = 'blue', histtype = 'bar', rwidth = 0.65) # label for the x-axis plt.xlabel('age') # label for the frequency plt.ylabel('The count of people') # title plotting plt.title('The Histogram') # display the plot plt.show()
To plot a histogram, we use the plt.hist() function.
The ages list is passed as the frequency list.
A tuple containing min and max values may be used to define a range.
What follows is to “bin” the range of values, which involves dividing the entire range of values into a series of intervals and taking the count of values that fall into each interval. We’ve set bins = 10 in this case. As a result, there are 100/10 = 10 cycles in total.
Scatter graph
import matplotlib.pyplot as plt # values of the x-axis x = [3,4,6,8,7,8,9,10,11,12] # y-axis values y = [4,6,7,9,8,10,11,13,14,14] # scatter plot plt.scatter(x, y, label= "stars", color= "blue", marker= "*", s=30) # the x axis label plt.xlabel('the x axis') # label for the frequency plt.ylabel('the y axis') # title for the plot plt.title('The scatter plot') # display the legend plt.legend() # display the plot plt.show()
To map a scatter plot, we use the plt.scatter() function.
We define x and y-axis values in the same way as we define them in a line.
The character to use as a marker is defined by the marker claim. The parameter can be used to specify its scale.
Pie-Chart
import matplotlib.pyplot as plt # labels definition my_hobbies = ['walk', 'read', 'dance', 'work'] # each labels' portion slices = [5, 9, 10, 8] # each label's color colors = ['r', 'y', 'g', 'b'] # pie chart - plotting plt.pie(slices, labels = my_hobbies, colors=colors, startangle=90, shadow = True, explode = (0, 0, 0.1, 0), radius = 1.2, autopct = '%1.1f%%') # plot the legend plt.legend() # display the plot plt.show()
Using the plt.pie() process, we create a pie map.
To begin, we use a list called activities to define the labels.
Then, using a separate list called slices, each label’s portion can be described.
A list of named colors is used to describe the color for each mark.
If shadow = True, a shadow will appear underneath each mark in the pie chart.
The start angle rotates the pie chart’s beginning by a defined number of degrees counterclockwise from the x-axis.
The fraction of radius with which we offset each wedge is set using explode.
The meaning of each label is formatted using autopct. We’ve set it to only display the percentage value up to one decimal place.
How to plot an equation’s curves?
# the needed modules are imported import matplotlib.pyplot as plt import numpy as np #set the x coordinates here x_coords = np.arange(0, 2*(np.pi), 0.1) # setting the corresponding y - coordinates y_coords = np.sin(x_coords) # pot the given points plt.plot(x_coords, y_coords) plt.title ('How to plot an equation\'s curves) # display the plot plt.show()
NumPy
NumPy is a general-purpose array-processing package in Python.
We use the np.arange() method to set the x-axis values, with the first two arguments being a range and the third being a step-wise increment. A numpy array is the end product.
We simply use the numpy array’s predefined np.sin() method to get the corresponding y-axis values.
Finally, we use the plt.plot() function to plot the points using the x and y arrays. So, in this section, we explore the different types of plots we can make with matplotlib. More plots haven’t been explored.
Subplots
Subplots are a plot within a plot.
When we want to display two or more plots in the same figure, we need to use subplots.
Approach 1:
# importing required modules import matplotlib.pyplot as plt import numpy as np # generation of coordinates def create_plot(ptype): # x-axis values set x_coords = np.arange(-10, 10, 0.01) # y-axis values set if ptype == 'linear': y_coords = x_coords elif ptype == 'quadratic': y_coords = x_coords **2 elif ptype == 'cubic': y_coords = x_coords**3 elif ptype == 'quartic': y_coords = x_coords **4 return(x_coords , y_coords ) # set your preferred style plt.style.use('fivethirtyeight') # figure creation _fig = plt.figure() # in the figure define subplots and their positions plot_1 = _fig.add_subplot(221) plot_2 = _fig.add_subplot(222) plot_3 = _fig.add_subplot(223) plot_4 = _fig.add_subplot(224) # plotting points on every single subplot x_coords, y_coords = create_plot('linear') plot_1.plot(x_coords, y_coords, color ='r') plot_1.set_title('$y_1 = x$') x_coords, y_coords = create_plot('quadratic') plot_2.plot(x_coords, y_coords, color ='b') plot_2.set_title('$y_2 = x^2$') x_coords, y_coords = create_plot('cubic') plot_3.plot(x_coords, y_coords, color ='g') plot_3.set_title('$y_3 = x^3$') x_coords, y_coords = create_plot('quartic') plot_4.plot(x_coords, y_coords, color ='k') plot_4.set_title('$y_4 = x^4$') # adjust the space between subplots _fig.subplots_adjust(hspace=.5,wspace=0.5) # display the plot plt.show()
Let’s take a look at this software one stage at a time:
plt.style.use('fivethirtyeight')
Plots can be styled in various ways, including using one of the available models or creating your own.
plt.figure = plt.figure ()
All plot elements are contained in a top-level container called a figure. As a result, we describe a figure as _fig, which includes all of our subplots.
plot_1 = _fig.add_subplot (221) plot_2 = _fig.add_subplot(222) plot_3 = _fig.add_subplot(223) plot_4 = _fig.add_subplot(224)
To define subplots and their locations, we use the _fig.add_subplot process. It is how the feature prototype looks:
add_subplot(nrows, ncols, plot number)
When you apply a subplot to a number, the figure is split into ‘nrows’ * ‘ncols’ sub-axes. The ‘plot number’ parameter specifies the subplot that the function call must construct. ‘plot number’ can be anything from 1 to ‘nrows’ * ‘ncols.’
Suppose the three parameters have values less than 10. In that case, the function subplot can be named with only one int parameter, with the hundreds representing ‘nrows,’ the tens representing ‘ncols,’ and the units representing ‘plot number.’ It means that we should write subplot(2, 3, 4) instead of subplot(2, 3, 4). (234).
This diagram will demonstrate how positions are defined:
x_coords,y_coords =create_plot('linear') plot_1.plot(x, y, color ='r') plot_1.set_title('$y_1 = x$')
Then, on each subplot, we plot our points. But, first, we use the create plot function to generate x and y-axis coordinates by specifying the type of curve we want.
Then, using the .plot form, we plot those points on our subplot. The set title method is used to change the title of a subplot. When you use $ at the beginning and end of the title document, ‘_’ (underscore) is read as a subscript. While and ‘^’ is read as a superscript.
_fig.subplots_adjust(hspace=.5,wspace=0.5)
Another useful approach for creating space between subplots is to use this technique.
plt.show ()
Finally, we use the plt.show() method to display the current figure.
Approach 2:
# importing required modules import matplotlib.pyplot as plt import numpy as np # generation of coordinates def create_plot(ptype): # set values for the x-axis x_coords = np.arange(0, 5, 0.01) # set the values for y-axis if ptype == 'sin': # a sine wave y_coords = np.sin(2*np.pi*x_coords) elif ptype == 'exp': # exponential function is negative y_coords = np.exp(-x_coords) elif ptype == 'hybrid': # sine wave is damped y_coords = (np.sin(2*np.pi*x_coords))*(np.exp(-x_coords)) return(x_coords, y_coords) # set the style to use plt.style.use('ggplot') # defining subplots and their positions plot_1 = plt.subplot2grid((11,1), (0,0), rowspan = 3, colspan = 1) plot_2 = plt.subplot2grid((11,1), (4,0), rowspan = 3, colspan = 1) plot_3 = plt.subplot2grid((11,1), (8,0), rowspan = 3, colspan = 1) # plotting points on each subplot x_coords, y_coords = create_plot('sin') plot_1.plot(x_coords, y_coords, label = 'sine wave', color ='b') x_coords, y_coords = create_plot('exp') plot_2.plot(x_coords, y_coords, label = 'negative exponential', color = 'r') x_coords, y_coords = create_plot('hybrid') plot_3.plot(x_coords, y_coords, label = 'damped sine wave', color = 'g') # show legends of each subplot plot_1.legend() plot_2.legend() plot_3.legend() # function to show plot plt.show()
Let’s go through some of the most critical aspects of this program:
plot_1 = plt.subplot2grid((11,1), (0,0), rowspan = 3, colspan = 1) plot_2 = plt.subplot2grid((11,1), (4,0), rowspan = 3, colspan = 1) plot_3 = plt.subplot2grid((11,1), (8,0), rowspan = 3, colspan = 1)
subplot2grid is similar to “pyplot.subplot,” but it employs 0-based indexing and allows the subplot to occupy several cells.
Let’s take a look at the subplot2grid method’s arguments:
- argument 1: is the grid’s geometry
- argument 2: grid position of the subplot
- argument 3: (rowspan) The number of rows that the subplot covers.
- argument 4: (colspan) The number of columns that the subplot covers.
This diagram will help to clarify the concept:
Each subplot in our example spans three rows and one column, with two empty rows (rows number 4,8).
x_coords, y_coords = create_plot('sin') plot_1.plot(x_coords, y_coords, label ='sine wave', color ='b')
There’s nothing remarkable about this section because the syntax for plotting points on a subplot remains the same.
plt.legend()
It will show the subplot’s label on the figure.
plt.show ()
Finally, the plt.show() function is used to display the current story.
Note: Based on the above two examples, we may conclude that when plots are standardized in scale, the subplot() method should be used, while the subplot2grid() method should be used when we want more flexibility in the location and sizes of our subplots.
Plotting in three dimensions
Matplotlib makes it easy to build 3-D graphs. Subsequently, let’s examine some of the most important and widely used 3-D plots.
How to Plot Points
from mpl_toolkits.mplot3d import axes3d import matplotlib.pyplot as plt from matplotlib import style import numpy as np # custom style – is set here style.use('ggplot') #construct a new plotting figure _fig = plt.figure() # Make a new subplot on our diagram and make the projection 3d ax_1 = _fig.add_subplot(111, projection='3d') # define x, y, z co-ordinates x_coords = np.random.randint(0, 10, size = 20) y_coords = np.random.randint(0, 10, size = 20) z_coords = np.random.randint(0, 10, size = 20) # plotting the points on subplot # setting labels for the axes ax_1 .set_xlabel('the x axis') ax_1 .set_ylabel('the y axis') ax_1 .set_zlabel('the z axis') # display the plot plt.show()
The above program’s output will give you a window where you can rotate or expand the plot. Here’s an example: dark points are closer to each other than light points
This section breaks down the code’s most relevant features
from mpl_toolkits.mplot3d import axes3d
It is the module required to plot in three dimensions.
ax_1 = fig.add_subplot(111, projection='3d')
On our figure, we construct a subplot and set the projection argument to 3d.
ax_1.scatter(x, y, z, c = 'm', marker = 'o')
To map the points in the XYZ plane, we now use the.scatter() function.
Line’s plotting
# required modules imported from mpl_toolkits.mplot3d import axes3d import matplotlib.pyplot as plt from matplotlib import style import numpy as np # deciding on a unique style to use style.use('ggplot') # construct a new plotting figure _fig = plt.figure() # create a new subplot on our figure ax_1 = _fig.add_subplot(111, projection='3d') # defining x, y, z co-ordinates x_coords = np.random.randint(0, 10, size = 5) y_coords = np.random.randint(0, 10, size = 5) z_coords = np.random.randint(0, 10, size = 5) # plotting the points on subplot ax1.plot_wireframe(x,y,z) # setting the labels ax_1.set_xlabel('the x axis') ax_1.set_ylabel('the y axis') ax_1.set_zlabel('the z axis') plt.show()
A screenshot of the above program’s 3-D plot will look like this:
The following is the key difference between this program and the previous one:
ax_1.plot_wireframe(x,y,z)
To plot lines over a range of 3-D points, we used the.plot_wireframe() form.
Bars’ Plotting
# importing required modules from mpl_toolkits.mplot3d import axes3d import matplotlib.pyplot as plt from matplotlib import style import numpy as np # setting a custom style to use style.use('ggplot') # construct a new plotting figure _fig = plt.figure() # add a new subplot to our diagram ax_1 = _fig.add_subplot(111, projection='3d') # defining x, y, z co-ordinates for bar position x_coords = [3,4,5,6,7,8,9,10,11,12] y_coords = [6,5,3,8,7,5,9,7,5,9] z_coords = np.zeros(10) # size of bars _dx = np.ones(10) # length measured along the x-axis _dy = np.ones(10) # length measured along the y-axs _dz = [5,7,8,6,10,12,9,9,14,13] # height of bar # establishing a color scheme color = [] for val in _dz: if val > 7: color.append('r') else: color.append('b') # bars plotting ax_1.bar3d(x_coords, y_coords, z_coords, _dx, _dy , _dz , color = color) # setting axes labels ax_1.set_xlabel('the x axis') ax_1.set_ylabel('the y axis') ax_1.set_zlabel('the z axis') plt.show()
It is a screenshot of the 3-D environment that was created:
Let’s go through some of the most critical aspects of this program:
x_coords = [3,4,5,6,7,8,9,10,11,12] y_coords = [6,5,3,8,7,5,9,7,5,9] z_coords = np.zeros(10)
The base positions of bars are described here. When z = 0 is set, all bars begin on the XY plane.
_dx = np.ones(10) # length measured along the x-axis
_dy = np.ones(10) # length measured along the y-axs
The bar scale is indicated by the letters _dx, _dy, and _dz. Consider the bar to be a cuboid, and the expansions along the x, y, and z axes are _dx, _dy, and _dz, respectively.
for val in _dz : if val > 7: color append('r') else: color append('b')
As a list, we set the color for each bar. For bars with a height greater than 5, the color scheme is red, and for bars with a height less than 5, the color scheme is blue.
ax1.bar3d(x, y, z, _dx, _dy, _dz, color = color)
The function to plot the bars is use .bar3d()
Curve plotting
# importing required modules from mpl_toolkits.mplot3d import axes3d import matplotlib.pyplot as plt from matplotlib import style import numpy as np # deciding on a unique style to use style.use('ggplot') # construct a new plotting figure _fig = plt.figure() # create a new subplot on our figure ax_1 = _fig.add_subplot(111, projection='3d') # get points for a mesh grid u, v = np.mgrid[0:2*np.pi:200j, 0:np.pi:100j] # setting x, y, z co-ordinates x_coords=np.cos(u)*np.sin(v) y_coords=np.sin(u)*np.sin(v) z_coords=np.cos(v) # Currently, the curve is being plotted. ax_1.plot_wireframe(x_coords, y_coords, z_coords, rstride = 8, cstride = 8, linewidth = 1) plt.show()
This program’s output would look like this:
We used a sphere as a mesh grid in this example.
The vital points worth considering include:
u, v = np.mgrid[0:2*np.pi:200j, 0:np.pi:100j]
We use np.mgrid to obtain points to build a mesh.
More information on this can be found here.
x_coords=np.cos(u)*np.sin(v) y_coords=np.sin(u)*np.sin(v) z_coords=np.cos(v)
It is nothing more than a sphere’s parametric equation.
ax_1.plot_wireframe(x_coords, y_coords, z_coords, rstride = 8, cstride = 8, linewidth = 1)
Alternatively, we can use the.plot wireframe() form. The rstride and cstride arguments can be used to specify how thick our mesh needs to be in this case.
Without a Line of Sight
You may use the shortcut string notation parameter ‘o’, which stands for ‘lines,’ to plot only the markers.
import matplotlib.pyplot as plt import numpy as np x_coords = np.array([3, 10]) y_coords = np.array([5, 12]) plt.plot(x_coords, y_coords, 'o') plt.show()
Several Points
You have no limitation to the number of points you would like to plot, as long as both axes have the same number of points.
As an illustration, in a diagram, draw a line from position (1, 5) to position (4, 10), then to position (8, 3), and finally to position (10, 12):
import matplotlib.pyplot as plt<br>import numpy as np x_coords = np.array([3, 4, 8, 10]) y_coords = np.array([5, 10, 3, 12]) plt.plot(x_coords, y_coords) plt.show()
X-Points by default
If the x-axis points are not defined, they will be assigned the default values of 0, 3, 4, 5, 6, 7, etc. However, that depends on the duration of the y-points.
So, if we use the same example before but don’t have the x-points, the diagram looks like this:
Plotting without x-points as an example:
import matplotlib.pyplot as plt import numpy as np y_coords = np.array([5, 11, 3, 12, 7, 9]) plt.plot(y_coords) plt.show()
Matplotlib allows you to fine-tune your plots—for example, and you can specify the x-position of each bar in a barplot.
The racing results are plotted in Matplotlib as follows:
import matplotlib.pyplot as plt time = [0, 1, 2, 3] position = [0, 100, 200, 300] plt.plot(time, position) plt.xlabel('Time (hr)') plt.ylabel('Position (km)') plt.show()
To view numeric data in plots, graphs, and charts in Python, Pythonistas usually use the Matplotlib plotting library. In addition, matplotlib’s two APIs (Application Programming Interfaces) have a wide range of functionality:
OO (Object-Oriented) API interface provides a list of objects constructed with greater flexibility than pyplot. Pyplot API interface has a hierarchy of code objects that make matplotlib function like MATLAB. The OO API gives you direct access to the backend layer of matplotlib.
How to Use the Plot() Function to Make a Simple Plot
The matplotlib.pyplot.plot() function offers a single interface for making various plot types.
The plot() function is used in the simplest example to plot values as x,y coordinates in a data plot. Plot() takes two parameters to define plot coordinates in this case:
An array of X-axis coordinates is passed as a parameter.
An array of Y-axis coordinates is passed as a parameter.
By generating two arrays of (2,8) and (4,9), a line spanning from x=2, y=4 to x=8, y=9 can be plotted:
import matplotlib.pyplot as plt import numpy as np # coordinates on the x axis x_coords = np.array([4, 10]) # coordinates on the Y axis y_coords = np.array([6, 11]) plt.plot(x_coords, y_coords) plt.show()
Markers and Linestyles –Modify the Look of a Plot
The matplotlib keywords marker and linestyle can be used to customize the appearance of data in a plot without changing the data values.
Each data value in a plot is labeled with a ‘marker ‘using the marker statement.
The linestyle argument can change the appearance of lines between data values or delete them entirely. The letter “o” labels every data value and gives a dashed linestyle “–” in this example:
import matplotlib.pyplot as plt import numpy as np x_coords = np.array([4, 14, 5, 11]) # Customize the linestyle for each data value: plt.plot(x_coords, marker = "o", linestyle = "-.") plt.show()
A partial list of string characters that can be used as markers and line styles is as follows:
- “-” solid line style
- “–” dashed line style
- ” ” no line
- “o” letter marker
Advanced plots, such as scatter plots, are also supported by Matplotlib. For example, the scatter() function is used to view data values as a set of x,y coordinates represented by single dots in this example.
Two identical arrays, one for X-axis values and the other for Y-axis values, are plotted in this example. Again, a dot is used to indicate each value:
Example of a Matplotlib Scatter Plot
import matplotlib.pyplot as plt # values in the X axis x_coords = [4,5,9,31,10,7,15,13,24,35] # values in the Y axis y_coords = [6,9,57,45,4,6,13,24,35,46] # plotting a scatter plt.scatter(x_coords, y_coords) plt.show()
Multiple Data Sets in One Plot with Matplotlib
Matplotlib is a powerful plotting library that can handle several datasets in a single plot. We’ll plot two different data sets, xdata1 and xdata2, in this example:
import matplotlib.pyplot as plt import numpy as np # random seed generation np.random.seed(5484849901) # Creation of random data xdata = np.random.random([6, 12]) # Creation of two datasets from the random floats x_data1 = xdata[0, :] x_data2 = xdata[1, :] # Sort the data in both datasets: x_data1.sort() x_data2.sort() # Creation of y data points y_data1 = x_data1 ** 2 y_data2 = 1 - x_data2 ** 4 # data plotting plt.plot(x_data1, y_data1) plt.plot(x_data2, y_data2) # Set lower and upper limits for x,y plt.xlim([0, 1]) plt.ylim([0, 1]) plt.title("Multiple Datasets in One Plot") plt.show()
Subplots with Matplotlib
Matplotlib can also be used to generate complex figures with multiple plots. Multiple axes are enclosed in one figure and shown in subplots in this example:
import matplotlib.pyplot as plt import numpy as np # Make a figure with two rows and two columns of subplots like follows fig, ax = plt.subplots(2, 2) x = np.linspace(25,30 , 125) # Within a single figure, index four axes arrays in four subplots: ax[0, 0].plot(x, np.sin(x), 'g') #row=0, column=0 ax[1, 0].plot(range(100), 'b') #row=1, column=0 ax[0, 1].plot(x, np.cos(x), 'r') #row=0, column=1 ax[1, 1].plot(x, np.tan(x), 'k') #row=1, column=1 plt.show()
Plotting the Phase Spectrum in Matplotlib
The frequency characteristics of a signal can be visualized using a phase spectrum map.
We’ll plot the phase spectrum of two signals represented as functions with different frequencies in this advanced example:
import matplotlib.pyplot as plt import numpy as np # pseudo-random numbers generation np.random.seed(0) # interval of sampling dt = 0.01 # Frequency of sampling Fs = 1 / dt # ex[;aom Fs] # noise generation t = np.arange(0, 10, dt) res = np.random.randn(len(t)) r = np.exp(-t / 0.05) # Convolution of 2 signals or functions conv_res = np.convolve(res, r)*dt conv_res = conv_res[:len(t)] s = 0.5 * np.sin(1.5 * np.pi * t) + conv_res # plot creation fig, (ax) = plt.subplots() ax.plot(t, s) # the phase spectrum function plots ax.phase_spectrum(s, Fs = Fs) plt.title("Plotting of Phase Spectrum ") plt.show()
3D Plot with Matplotlib
By allowing the use of a Z-axis, Matplotlib can also handle 3D plots. We’ve already made a 2D scatter plot, but we’ll make a 3D scatter plot in this example:
from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt fig = plt.figure() # Creation of a single 3D subplot ax = fig.add_subplot(111, projection='3d') # '111' is a MATlab convention for creating a grid with one row and one column that is utilized by Matplotlib. # The new Axes location is the first cell in the grid. # Create x,y,z coordinates: x_coords =[3,4,5,6,7,8,9,10,11,12] y_coords =[13,6,4,7,15,6,12,4,6,10] z_coords =[4,5,6,7,7,9,11,13,21,11] # Create a 3D scatter plot with x,y,z orthogonal axis, and red "o" markers: ax.scatter(x_coords, y_coords, z_coords, c='blue', marker="o") # Create x,y,z axis labels: ax.set_xlabel(' the x Axis') ax.set_ylabel('the y Axis') ax.set_zlabel('the z Axis') plt.show()
What Is a Matplotlib Backend and How Can I Use It?
Matplotlib can output to almost any format you can imagine. Plots are typically displayed in a data scientist’s Jupyter notebook, but they can also be displayed inside an application.
Matplotlib’s OO backend uses the Tkinter TkAgg() function to generate high-quality Agg (Anti-Grain Geometry) rendering and the Tk mainloop() function to show a plot in this example:
from tkinter import * from tkinter.ttk import * import matplotlib matplotlib.use("TkAgg") from matplotlib.figure import Figure # Object Oriented backend (Tkinter) tkagg() function from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg root = Tk() _fig = Figure(figsize=(5, 4), dpi=100) plot = _fig.add_subplot(1, 1, 1) x_coords = [ 0.1, 0.2, 0.3, 0.4 ] y_coords = [ -0.1, -0.2, -0.3, -0.4 ] plot.plot(x_coords, y_coords, color="red", marker="o", linestyle="--") canvas = FigureCanvasTkAgg(_fig , root) canvas.get_tk_widget().grid(row=0, column=0) root.mainloop()
Seaborn
Seaborn is an abstraction layer built on top of Matplotlib that provides a user-friendly interface for quickly creating various useful plot types.
It does not, however, make any concessions in terms of control! You still have full control since Seaborn provides escape hatches to access the underlying Matplotlib properties. Seaborn creates some of the most attractive statistical graphs that are very informative.
import seaborn as sns import matplotlib.pyplot as plt # default theme is applied sns.set_theme() # Load an example dataset tips_data = sns.load_dataset("tips") # Sex visualization sns.relplot( data=tips_data, x="total_bill", y="tip", col="time", hue="sex", style="sex", size="size", facet_kws=dict(sharex=False), ) plt.show()
Plotly
Plotly is a Python plotting library that comes with a plotting ecosystem. It comes with three different user interfaces:
- An object-oriented user interface
- An imperative interface for specifying your plot using JSON-like data structures.
- Plotly Express is a high-level GUI close to Seaborn.
- Plotly plots are made to be used in web applications. Plotly is a JavaScript library at its heart! The plots are drawn with D3 and stack.gl.
Bypassing JSON to the JavaScript library, you can build Plotly libraries in other languages. That is exactly what the official Python and R libraries do. The Python Plotly API was ported to run in the web browser.
Example Scatter Plot
import plotly.graph_objects as go import numpy as np x_coords = np.linspace(0, 100, 1000) y_coords = np.sin(x_coords) fig = go.Figure(data=go.Scatter(x=x_coords, y=y_coords, mode='markers')) fig.show()
Scatter and Line Plots
import plotly.graph_objects as go # Create random data with numpy import numpy as np np.random.seed(1) count_val = 1000 random_x = np.linspace(0, 100, count_val) random_y0 = np.random.randn(count_val) + 5 random_y1 = np.random.randn(count_val) random_y2 = np.random.randn(count_val) - 5 _fig = go.Figure() # Add traces _fig.add_trace(go.Scatter(x=random_x, y=random_y0, mode='markers', name='markers only')) _fig.add_trace(go.Scatter(x=random_x, y=random_y1, mode='lines+markers', name='lines & markers')) _fig.add_trace(go.Scatter(x=random_x, y=random_y2, mode='lines', name='lines only')) _fig.show()
Bubble Scatter Plots in Plotly
import plotly.graph_objects as go x_coords =[3, 4, 5, 6] y_coords =[15, 16, 17, 18] fig = go.Figure(data=go.Scatter( x=x_coords, y=y_coords, mode='markers', marker=dict(size=[50, 70, 90, 110], color=[0, 1, 2, 3]) )) fig.show()
Style Scatter Plot in Plotly
import plotly.graph_objects as go import numpy as np var_counts = np.linspace(0, 10, 100) _fg = go.Figure() _fg.add_trace(go.Scatter( x=var_counts, y=np.sin(var_counts), name='Sine', mode='markers', marker_color='rgba(180, 0, 0, .7)' )) _fg.add_trace(go.Scatter( x=var_counts, y=np.cos(var_counts), name='Cosine', marker_color='rgba(250, 187, 190, 1)' )) # With fig.update traces, you can choose settings that apply to all traces. _fg.update_traces(mode='markers', marker_line_width=2, marker_size=10) _fg.update_layout(title='Style Scatter Plots', yaxis_zeroline=True, xaxis_zeroline=True) _fg.show()
Bokeh
Since Bokeh (pronounced “BOE-kay”) specializes in immersive plots, this typical example doesn’t do it justice. Bokeh’s plots, like Plotly’s, are planned to be inserted in web apps and are saved as HTML files.
Bokeh allows you to build interactive, JavaScript-powered visualizations that can be seen in a web browser.
Bokeh is essentially a two-step process: To begin creating your visualization, you must first choose among Bokeh’s building elements. Second, you personalize these building pieces to meet your specific requirements.
Bokeh does this by combining two elements:
- A Python library for specifying your visualization’s content and interactive features.
- BokehJS is a JavaScript library that displays your interactive visualizations in a web browser in the background.
Bokeh produces all of the necessary JavaScript and HTML code based on your Python code.
How to install Boker in Debian Distros (e.g., Ubuntu)
Copy the following commands on the terminal and press enter.
pip install bokeh
Drawing a Line Chart using Bokeh
from bokeh.plotting import figure, show # data preparation x_coords = [1, 2, 3, 4, 5] y1_coords = [6, 7, 2, 4, 5] y2_coords = [2, 3, 4, 5, 6] y3_coords = [4, 5, 5, 7, 2] # title and axis labels contained in the new plot p = figure(title="Multiple line example", x_axis_label="x", y_axis_label="y") # using a several renderers p.line(x_coords, y1_coords, legend_label="Temp.", line_color="blue", line_width=2) p.line(x_coords, y2_coords, legend_label="Rate", line_color="red", line_width=2) p.line(x_coords, y3_coords, legend_label="Objects", line_color="green", line_width=2) # results are displayed using the subsequent command show(p)
Drawing a Bar Chart using Bokeh
from bokeh.plotting import figure, show # data preparation x_coords = [1, 2, 3, 4, 5] y1_coords = [6, 7, 2, 4, 5] y2_coords = [2, 3, 4, 5, 6] y3_coords = [4, 5, 5, 7, 2] # Make a new plot with a title and labels for the axes. p = figure(title=" Bar Chart Example in Bokeh", x_axis_label="x", y_axis_label="y") # addition of multiple renderers p.line(x_coords, y1_coords, legend_label="Temp.", line_color="blue", line_width=2) p.vbar(x=x_coords, top=y2_coords, legend_label="Rate", width=0.5, bottom=0, color="red") p.circle(x_coords, y3_coords, legend_label="Objects", line_color="yellow", size=12) # results are shown here show(p)
Glyphs Customizations in Bokeh
from bokeh.plotting import figure, show # prepare some data x_coords = [1, 2, 3, 4, 5] y_coords = [6, 7, 7, 9, 4] # new plot has a title as well as the labels for the axis p = figure(title="Customizing Glyphs properties in Bokeh", x_axis_label="x", y_axis_label="y") # additional arguments added with circle renderer circle = p.circle( x_coords, y_coords, legend_label="Objects", fill_color="red", fill_alpha=0.5, line_color="blue", size=80, ) # modify the color of the glyph glyph = circle.glyph glyph.fill_color = "green" # display results as a diagrammatic representation show(p)
Combined Line and Glyphs in Bokeh
from bokeh.plotting import figure, show # data preparation x_coords = [3, 4, 5, 6, 7] y1_coords = [5, 6, 6, 8, 3] y2_coords = [3, 4, 5, 6, 7] # new plot p = figure(title="Line and Glyphs Combined") # addition of the circle renderer alongside legend_label arguments line = p.line(x, y1, legend_label="Line.", line_color="green", line_width=1.5) circle = p.circle( x_coords, y2_coords, legend_label=" The Objects", fill_color="red", fill_alpha=0.5, line_color="green", size=80, ) # declare the legend to be positioned in the top left corner p.legend.location = "top_left" # addition of legend title p.legend.title = "Legend title" # legend text appearance can be modified as follows p.legend.label_text_font = "times" p.legend.label_text_font_style = "italic" p.legend.label_text_color = "navy" # legend border and background is altered as below p.legend.border_line_width = 3 p.legend.border_line_color = "navy" p.legend.border_line_alpha = 0.8 p.legend.background_fill_color = "navy" p.legend.background_fill_alpha = 0.2 # display the results show(p)
Glyph Properties Vectorizing
You’ll use data vectors to alter the features of your plot and its elements in this section.
Color vectorization
So far, you’ve used properties like fill_color to assign certain colors to a glyph.
Pass a variable containing color information to the fill color property to alter colors based on its values, as shown below.
import random from bokeh.plotting import figure, show # generate some data (1-10 for x, random values for y) x = list(range(0, 26)) y = random.sample(range(0, 100), 26) #make a list of rgb hex colors that are related to y colors = ["#%02x%02x%02x" % (255, int(round(value * 255 / 100)), 255) for value in y] # creation of a new plot p = figure( title="Example's of vectorized Bokeh Colors", sizing_mode="stretch_width", max_width=500, plot_height=250, ) # addition of both line and circle renderers line = p.line(x, y, line_color="green", line_width=1) circle = p.circle(x, y, fill_color=colors, line_color="red", size=15) # display results show(p)
Colors and sizes are vectorizations
Apply the same technique to your renderer’s radius argument to build a plot with colors and sizes in proportion to your data as shown.
import numpy as np from bokeh.plotting import figure, show # data generation N = 1000 x_coords = np.random.random(size=N) * 100 y_coords = np.random.random(size=N) * 100 # based on the data given, generate colors and radii radii = y_coords / 100 * 2 colors = ["#%02x%02x%02x" % (130, int(round(value * 255 / 100)), 216) for value in y_coords] # establish a plot with a given size p = figure( title="Vectorized Radii & Colors in Bokeh ", sizing_mode="stretch_width", max_width=500, plot_height=250, ) # addition of a circle renderer p.circle( x_coords, y_coords, radius=radii, fill_color=colors, fill_alpha=0.6, line_color="grey", ) # display the results show(p)
Palettes for color mapping
You may utilize Bokeh’s dozens of pre-defined color palettes to map colors to your data because it contains Brewer, D3, and Matplotlib palettes.
from bokeh.io import show from bokeh.palettes import Turbo256 from bokeh.plotting import figure from bokeh.transform import linear_cmap # data generation x_coords = list(range(-32, 33)) y_coords = [i**2 for i in x_coords ] # creation of color mapper that is linear mapper = linear_cmap(field_name="y", palette=Turbo256, low=min(y_coords), high=max(y_coords)) # plot creation _plot = figure(plot_width=500, plot_height=250) # circle renderer created with color mapper _plot.circle(x_coords, y_coords, color=mapper, size=10) show(_plot)
Combining Plots
from bokeh.layouts import row from bokeh.plotting import figure, show # data preparation x_coords = list(range(11)) y0_coords = x_coords y1_coords = [10 - i for i in x_coords] y2_coords = [abs(i - 5) for i in x_coords] # single renderer with three different plots first_plot = figure(plot_width=250, plot_height=250, background_fill_color="#fafafa") first_plot.circle(x_coords, y0_coords, size=12, color="#0000FF", alpha=0.8) second_plot = figure(plot_width=250, plot_height=250, background_fill_color="#fafafa") second_plot.triangle(x_coords, y1_coords, size=12, color="#00FF7F", alpha=0.8) third_plot = figure(plot_width=250, plot_height=250, background_fill_color="#fafafa") third_plot .square(x_coords, y2_coords, size=12, color="#FFFF00", alpha=0.8) # placement of results in the same row automatically adjusts in line with browser window's width show(row(children=[first_plot, second_plot, third_plot ], sizing_mode="scale_width"))
Data collection and filtering
To import and filter data, you’ll use a variety of sources and structures in this section.
Making use of ColumnDataSource
Bokeh’s data structure is the ColumnDataSource. To date, you’ve passed data to Bokeh using data sequences such as Python lists and NumPy arrays. These lists have been automatically turned into ColumnDataSource objects by Bokeh.
To construct a ColumnDataSource directly, follow these steps:
- Import ColumnDataSource first.
- Create a dict with your data next: The keys of the dict are the column names (strings). The dict’s values are data lists or arrays.
- Then, as the data argument, provide your dict to ColumnDataSource:
- Your renderer can then utilize your ColumnDataSource as a source.
from bokeh.plotting import figure, show from bokeh.models import ColumnDataSource # a dict is created as the foundation for ColumnDataSource data = {'x_values': [1, 2, 3, 4, 5], 'y_values': [6, 7, 2, 3, 6]} # ColumnDataSource is then created based on the previous dict created initially source = ColumnDataSource(data=data) # a plot and renderer with ColumnDataSource data is subsequently created p = figure() p.circle(x='x_values', y='y_values', source=source) show(p)
Converting data in Pandas
Pass your pandas data to a ColumnDataSource to use data from a pandas DataFrame as follows.
ColumnDataSource(df)
Data filtering
Bokeh has several filtering options. If you want to construct a specific subset of the data in your ColumnDataSource, use these filters.
These filtered subsets are referred to as “views” in Bokeh. The CDSView class in Bokeh represents views. Pass a CDSView object to your renderer’s view argument to plot with a filtered subset of data.
There are two properties on a CDSView object:
- source: the ColumnDataSource to which the filters should be applied.
- a collection of Filter items
The IndexFilter is the most basic filter. An IndexFilter takes a set of index locations and provides a view that only shows the data points corresponding to those positions.
If your ColumnDataSource has a list of five values and you apply an IndexFilter with [0,2,4], the resulting view will only show the first, third, and fifth entries from your original list:
from bokeh.layouts import gridplot from bokeh.models import CDSView, ColumnDataSource, IndexFilter from bokeh.plotting import figure, show # ColumnDataSource is first created from a dict vals = ColumnDataSource(data=dict(x=[3, 4, 5, 6, 7], y=[1, 2, 3, 4, 5])) # using an IndexFilter create a view with the following index positions [0, 2, 4] view = CDSView(source=vals, filters=[IndexFilter([0, 2, 4])]) # define the setup tools setup_tools = ["box_select", "hover", "reset"] # The first plot is created with all data in the ColumnDataSource p = figure(plot_height=300, plot_width=300, tools=setup_tools ) p.circle(x="x", y="y", size=10, hover_color="red", source=vals) # The second plot is created with a subset of ColumnDataSource, based on view p_filtered = figure(plot_height=300, plot_width=300, tools=setup_tools) p_filtered.circle(x="x", y="y", size=10, hover_color="red", source=vals, view=view) # plots next to each other are both shown in a gridplot layout show(gridplot([[p, p_filtered]]))
Using Widgets
You’ll add interactive widgets to your plots in this section.
Widgets to be added
Widgets are extra visual elements that can be added to your display. For example, widgets can display additional information or control elements of your Bokeh page interactively, as illustrated below.
from bokeh.layouts import layout from bokeh.models import Div, RangeSlider, Spinner from bokeh.plotting import figure, show # data preparation x_coords = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12] y_coords = [4, 5, 5, 7, 2, 6, 4, 9, 1, 3] # circle glyphs used to create plots p = figure(x_range=(1, 9), plot_width=500, plot_height=250) points = p.circle(x=x_coords, y=y_coords, size=30, fill_color="#21a7df") # set up textarea (div) div = Div( text=""" <p>Use this control element to adjust the circle's size:</p> """, width=200, height=30, ) # spinner setup spinner = Spinner( title="Circle size", low=5, high=50, step=5, value=points.glyph.size, width=200, ) spinner.js_link("value", points.glyph, "size") # RangeSlider Setup range_slider = RangeSlider( title="Adjust x-axis range", start=0, end=10, step=1, value=(p.x_range.start, p.x_range.end), ) range_slider.js_link("value", p.x_range, "start", attr_selector=0) range_slider.js_link("value", p.x_range, "end", attr_selector=1) # layout creation layout = layout( [ [div, spinner], [range_slider], [p], ] ) # result display show(layout)
Displaying and exporting
You generated, altered, and merged visualizations in the previous steps. You’ll utilize a variety of approaches to show and export your visualizations in this section.
Making a stand-alone HTML document
To save your visualization to an HTML file, all of the examples so far have utilized the output_file() function. This HTML file provides all of the information you’ll need to see your plot.
output_file() takes a number of arguments. Consider the following scenario:
filename: the HTML file’s filename.
title: your document’s title (to be used in the HTML’s tag)</p> <p>When you use the show() function, Bokeh generates an HTML file. This function also launches a web browser to see the HTML file.</p> <p>Use the save() function instead if you only want Bokeh to generate the file and not open it in a browser. First, import it, then you can use the save() exactly like you did with show()</p>
from bokeh.plotting import figure, output_file, save # prepare some data x_coords = [3, 4, 5, 6, 7] y_coords = [4, 5, 5, 7, 2] # setting the output to a designated HTML file that is static output_file(filename="custom_filename.html", title="HTML File Output in Bokeh") # specifics of a new plot given here p = figure(sizing_mode="stretch_width", max_width=500, plot_height=250) # circle renderer added circle = p.circle(x_coords, y_coords, fill_color="red", size=15) # results saved to a file save(p)
Displaying in a Jupyter notebook
Replace Bokeh’s output_file() with output_notebook() if you’re using Jupyter notebooks ().
To see your visualization right inside your notebook, use the show() function:
PNG files can be exported
Additional requirements may be required to export PNG or SVG files.
Bokeh employs Selenium to generate PNG and SVG files. Bokeh can operate in a browser without a graphical user interface thanks to Selenium (GUI). Bokeh uses this browser to render PNG and SVG files. Selenium must access either a Firefox (through a program named geckodriver) or a Chromium browser for this to operate (through the chromedriver package).
Check that you have all the essential packages installed by using any of the subsequent commands. That depends on whether you’re using conda or pip:
conda install selenium geckodriver firefox -c conda-forge
or
pip install selenium geckodriver firefox
from bokeh.io import export_png from bokeh.plotting import figure # data preparation x_coords = [3, 4, 5, 6, 7] y_coords = [4, 5, 5, 7, 2] # a new plot is created with fixed dimensions p = figure(plot_width=350, plot_height=250) # a circle renderer is added here circle = p.circle(x_coords, y_coords, fill_color="red", size=15) # the results are saved to a file export_png(p, filename="plot.png")
Summary on Bokeh plotting tool
Even a straightforward graph like this has interactive elements. To investigate, use the tools to the right of the plot:
- pan tool – to move the graph within your plot, use the pan tool.
- Box zoom Icon – to zoom into a specific section of your plot, use the box zoom tool.
- Zoom wheel – with a mouse wheel, zoom in and out with the wheel zoom tool.
- Save tool – to save the current view of your plot as a PNG file, use the save tool.
- Reset tool – to return to the plot’s default settings, use the reset tool.
- Help tool – to understand more about the tools available in Bokeh, click the help sign.
Building visualizations in a nutshell
You’ve just finished all of the basic steps required by Bokeh’s bokeh.plotting interface for most simple visualizations. These included:
Step 1: Getting the data ready
Although you used a simple Python list, other types of serialized data will also work.
Step 2: Making a call to the figure() method
figure() produces a plot with the most commonly used default settings. You may change your plot’s title, tools, and axes labels, among other things.
Step 3: Renderers are being added.
To make a line, you used line(). Renderers include many choices for specifying visual features, including colors, legends, and widths.
Step 4: Requesting that Bokeh display or save the results using show() or save()
These options allow you to save your plot as an HTML file or see it in a web browser.
Altair
It finds its foundation in plotting declarative language or “visualization grammar” called Vega. Thus, it means it’s a well-thought-through API that scales well for complex plots, saving you from getting lost in nested-for-loop hell.
As with Bokeh, Altair outputs its plots as HTML files.
Bar chart using Altair
import altair as alt from vega_datasets import data data_source = data.wheat() bar_chart =alt.Chart(data_source).mark_bar().encode( x='year:O', y="wheat:Q", # The consequence of a conditional statement will be highlighted. # If the year is 1700 this test returns True, and sets the bar red # And if it's not false it sets the bar steelblue color=alt.condition( alt.datum.year == 1700, alt.value('red'), alt.value('steelblue') ) ).properties(width=750) bar_chart .save('bar_chart.html')
Pygal
Pygal is primarily concerned with appearance. It creates SVG plots by default, so you can zoom in and out as much as you like without them being pixellated. Pygal plots also have some built-in interactivity features, making it another underappreciated choice for embedding plots in a web app.
import pygal from pygal.style import Style from IPython.display import display, HTML from pygal.style import Style custom_style = Style( colors=('#E80080', '#404040', '#9BC850')) # prepare the bar plot, set for data bar_chart = pygal.Bar(style=custom_style) def getFactorial(n): if n == 1 or n == 0: return 1 else: return n * getFactorial(n-1) listOfFactorials = [getFactorial(i) for i in range(5)] bar_chart = pygal.Bar(height=400) bar_chart.add('Factorial List', listOfFactorials) bar_chart.render_in_browser()
And here’s the graph:
Pandas
Pandas is a Python data science library that is extremely popular. It not only allows you to do scalable data manipulation, but it also has a plotting API. The panda’s example is the shortest code snippet in this article, even shorter than the Seaborn code since it operates directly on data frames.
If you’re learning about a dataset or getting ready to report your results, visualization is a must-have method. With .plot(), Python’s common data analysis library, pandas, you can visualize the data in various ways. But, even if you’re just getting started with pandas, you’ll soon be able to create simple plots that enhance understanding of your data.
What are the various styles of pandas plots, and when should they be used?
- Using a histogram to get a quick overview of your data
- How to Use a Scatter Plot to find connection
- How to examine various groups and their proportions
Since the pandas API is a wrapper around Matplotlib, you can use the underlying Matplotlib – API to manage your plots more precisely.
Here’s a Panda’s plot of the election results. Again, the code is incredibly short and to the point!
import pandas as pd import matplotlib.pyplot as plt _url_download = ("https://raw.githubusercontent.com/fivethirtyeight/" "data/master/college-majors/recent-grads.csv") df = pd.read_csv(_url_download) type(df) print(df) df.plot(x="Rank", y=["P25th", "Median", "P75th"]) plt.show()
Plot’s Observations:
When one’s rank falls, so does their median income. Since the median income determines the rank, this is to be expected.
Between the 25th and 75th percentiles in some majors, there are wide differences. These degrees may receive significantly less or significantly more than the median salary.
The variation existing between the 25th and 75th percentiles in other majors is very little. These degrees receive wages that are very similar to the median.
The first plot already suggests that there’s a lot more in the data to explore! For example, some large corporations have a wide variety of profits, while others have a more limited range. You’ll use a variety of plot styles to uncover these disparities.
.plot() accepts a number of optional parameters. The parameter, in particular, accepts eleven different string values and decides the type of plot you’ll make:
- The term “area” refers to plots that cover a large area.
- The term “bar” refers to vertical bar maps.
- Horizontal bar maps are referred to as “barh.”
- The term “box” refers to box plots.
- Hexbin plots are denoted by the word “hexbin.”
- Histograms are represented by the letter “hist.”
- Kernel density estimation charts are abbreviated as “kde.”
- “Kde” is an alias for “density.”
- Line graphs are represented by the term “line.”
- The word “pie” refers to pie charts.
- The word “scatter” refers to scatter plots.
- “row” is the default value.
Line graphs, such as the one above, are useful for getting a quick overview of your data. They can be used to spot broad patterns. They seldom have deep analysis, but they can point you in the right direction.
.plot() generates a line plot with the index on the x-axis. And all the numeric columns on the y-axis if you don’t have a parameter. Although this is good for datasets with a few columns, it seems like a mess for the college majors dataset, which has multiple numeric columns.
Note: DataFrame objects have several methods for creating the different types of plots mentioned above, in addition to passing strings to the kind parameter of.plot():
- .area()
- .bar()
- .barh()
- .box()
- .hexbin()
- .hist()
- .kde()
- .density()
- .line()
- .pie()
- .scatter()
In this example, you’ll use the.plot() interface and the kind parameter to move strings. You should also give a try to the methods described above.
Matplotlib: A Look Behind the Scenes
Matplotlib generates the plot behind the scenes when you call.plot() on a DataFrame object. Check out two code snippets to see if this is true. To begin, use Matplotlib to generate a plot using two columns from your DataFrame:
import pandas as pd import matplotlib.pyplot as plt _url_download = ("https://raw.githubusercontent.com/fivethirtyeight/" "data/master/college-majors/recent-grads.csv") df = pd.read_csv(_url_download) plt.plot(df["Rank"], df["P75th"]) plt.show()
Using the DataFrame object’s.plot() form, you can make the exact same graph:
df.plot(x="Rank", y="P75th")
.plot() wraps pyplot.plot(), and the result is a graph that looks just like the one you made with Matplotlib:
To build the same graph from columns in a DataFrame object, use both pyplot.plot() and df.plot(). If you already have a DataFrame case, df.plot() is a better option than pyplot.plot().
Let’s look at the various types of plots you can make and how to make them. It is highly dependent on your understanding of the DataFrame object’s.plot() method as a wrapper for Matplotlib’s pyplot.plot() method.
Examine for Correlation
Frequently, you’ll want to check if two columns in a dataset are related. Do you have a lower risk of unemployment if you want a major with higher median earnings? Build a scatter plot with those two columns as a first step:
import pandas as pd import matplotlib.pyplot as plt _url_download = ("https://raw.githubusercontent.com/fivethirtyeight/" "data/master/college-majors/recent-grads.csv") df = pd.read_csv(_url_download) df.plot(x="Median", y="Unemployment_rate", kind="scatter") plt.show()
You should see a plot that appears to be completely random, such as this:
A brief look at this graph reveals that the earnings and unemployment rate have no meaningful relationship.
While a scatter plot is an excellent method for having a first impression of a potential correlation, it is far from conclusive evidence. You can use .corr to get a quick overview of the similarities between different columns (). If you assume a correlation between two values, you can use various methods to confirm your suspicions and determine how deep the correlation is.
However, keep in mind that just because two values have a connection does not mean that changing one would cause the other to change. To put it another way, association does not always mean causation.
Make a plan
Python provides a variety of ways to plot the same data with minimal code. While all of these methods will get you started quickly in creating charts, they require some local configuration.
Final Thoughts
You learned how to use Python and the various libraries available to visualize your dataset in this article. In addition, you’ve seen how a few simple plots will help you understand your data and guide your research.
In summary, you learned how to do the following in this tutorial:
- A scatter plot is vital in determining the correlation.
- Use bar plots to examine categories and pie plots to examine their ratios.
- Determine which plot is best for your current project.
- With a histogram, you can see how the dataset is distributed.
- You’ve discovered some options for visualizing the data using plot() and a small DataFrame.
- You’re now able to expand on your experience and experiment with even more advanced visualizations.
If you have any questions or suggestions, please leave them in the contact section below.