NumPy stands for Numerical Python which is widely used for scientific calculation for its advance interface for array manipulation. From Broadcasting to other tools it became quite handy when one is dealing with large data set. So let's start exploring Numpy features in this notebook with first importing numpy.
import numpy as np from numpy.random import randn
You may occasionally have some data set(usually array) for which you need to check whether data set contains integers, floating point numbers, strings, or Python objects. For this np.issubdtype(array, datatype) is used. Let's code for this.
Define our test array
arr1= np.ones(10, dtype= np.uint16) arr2= np.ones(10, dtype= np.float32) arr3= [1,2,3,4,5]
Now to Checking wheather a data set have a specific datatype using np.issubdtype() fucntion
print np.issubdtype(arr1.dtype,np.integer) print np.issubdtype(arr2.dtype, np.integer) print np.issubdtype(arr2.dtype, np.floating)
Output:
True False True
Numpy provide ease with Array Manipulation. Here are some handy feature of Numpy Array Manipulation.Let's First talk about Reshaping Array.
while using .reshape() function, we should keep in mind that overall shape of array shouldn't be changed. the input of this function is the dimension in which we want to convert the array.
First define sample array
arr3= np.arange(18) arr3
Output:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])
Now Let's reshape it into [9,2] size.
arr3.reshape((9,2))
Output:
array([[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11], [12, 13], [14, 15], [16, 17]])
Let's change this into [2,9] size.
arr3.reshape((2,9))
Output:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8], [ 9, 10, 11, 12, 13, 14, 15, 16, 17]])
Again Change the shape of array to a different size.
arr3.reshape((3,6))
Output:
array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17]])
One of the passed dimensions in .reshape() can be -1, in that case value used for that dimension will be taken from the data. See the example below
arr4 = np.arange(30) arr4
Output:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
Now Let's reshape using -1 as a dimension
arr4.reshape((5,-1))
here to complete the array size, in-place of -1 it should be 6 so reshape((5,-1)) will interrupt -1 as 6 and then proceed.
Output:
array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29]])
Check another example :
arr4.reshape((3,-1))
Here to complete the array size their should be 10 instead of -1.
Output:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
Next example:
arr4.reshape((-1,2))
Output:
array([[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19], [20, 21], [22, 23], [24, 25], [26, 27], [28, 29]])
newaxias() is used to create a new axis(dimension) in the array to fit the broadcast rule.Check example below
array= randn(2,3) array.shape
Output:
(2, 3)
Now let's create a new axis.
array2= array[:,np.newaxis, :] array2.shape
Output:
(2, 1, 3)
Numpy provide the opposite operation of reshape (i.e from higher dimension to one-dimensional). This is also known as flattening or raveling. Their are two methods.
flatten()
see the example below
arr5= np.ones((5,3), dtype= np.uint16) arr5
Output:
array([[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]], dtype=uint16)
Now
arr5.ravel()
Output:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=uint16)
For flatten()
arr5.flatten()
Output:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=uint16)
ravel() does not produce a copy of data if it does not have to While, the flatten() method always return a copy of the data.
arr4
Output:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
With order F:
arr4.reshape((3,10), order='F')
Output:
array([[ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27], [ 1, 4, 7, 10, 13, 16, 19, 22, 25, 28], [ 2, 5, 8, 11, 14, 17, 20, 23, 26, 29]])
when reshaping with order F, data is append column by column.
arr4.reshape((3,10), order='C') # order 'C' is by default
Output:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
When reshaping with order C, data is append raw by raw.
The Key difference between C and Fortran order is the order in which the dimensions are traversed:
C order Traverse Higher dimensions first (i.e axis 1 before advancing on axis 0). [Raw major order]
Fortan Order Traverse Higher dimensions last(i.e axis 0 before advancing on axis 1) [ column major order]
Numpy provides 2 types of concatenation of arrays.First is along axis=0 and other is along axis=1.
see the example below
arr6= np.array([[1,2,3,4,5],[6,7,8,9,10]]) arr6
Output:
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10]])
Our second array:
arr7= np.array([[11,12,13,14,15],[16,17,18,19,20]]) arr7
Output:
array([[11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])
Now Concatenation operation along axis=0
np.concatenate((arr6,arr7),axis=0)
Output:
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])
Concatenation operation along axis=1
np.concatenate((arr6,arr7), axis=1)
Output:
array([[ 1, 2, 3, 4, 5, 11, 12, 13, 14, 15], [ 6, 7, 8, 9, 10, 16, 17, 18, 19, 20]])
Some other function :
np.vstack((arr6,arr7)) # same as np.concatenate() with axis=0
Output:
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]])
Now
np.hstack((arr6,arr7)) # same as np.cooncatenate() with axis=1
Output:
array([[ 1, 2, 3, 4, 5, 11, 12, 13, 14, 15], [ 6, 7, 8, 9, 10, 16, 17, 18, 19, 20]])
Now Splitting Operation:
arr8= randn(4,4) arr8
Output:
array([[ 7.08533211e-01, 4.59659578e-01, -1.57881323e-01, 1.07929958e+00], [ -3.86824405e-01, -1.64399896e-01, 1.53551098e+00, 1.15847583e+00], [ 1.94122163e-01, -7.44141600e-01, -4.42922893e-01, -2.25669699e-01], [ -1.01093857e+00, -5.21043404e-01, -7.55925342e-04, 1.55661573e-01]])
split() takes array as argument along with size of array to be splited or a array like object of the array size.
arr_1= np.split(arr8,1) #size of arr_1 is 1 which mean arr8 will be divided in one part. arr_1
Output:
[array([[ 0.50454954, 0.65916656, 1.69246983, 0.86705127], [-0.71989673, 0.37454774, -0.32577363, -0.08872866], [ 0.65309926, -0.49518186, -0.73326777, 1.18062482], [ 0.9273575 , -1.40040011, 1.65876684, -0.65355634]])]
Split array in 2 parts:
np.split(arr8,2) # the output array will be half-half of arr8
Output:
[array([[ 0.50454954, 0.65916656, 1.69246983, 0.86705127], [-0.71989673, 0.37454774, -0.32577363, -0.08872866]]), array([[ 0.65309926, -0.49518186, -0.73326777, 1.18062482], [ 0.9273575 , -1.40040011, 1.65876684, -0.65355634]])]
Split array in 4 parts
np.split(arr8,4) #here if you give the input
Output:
[array([[ 0.50454954, 0.65916656, 1.69246983, 0.86705127]]), array([[-0.71989673, 0.37454774, -0.32577363, -0.08872866]]), array([[ 0.65309926, -0.49518186, -0.73326777, 1.18062482]]), array([[ 0.9273575 , -1.40040011, 1.65876684, -0.65355634]])]
Here if you give the size of splitting array 3 instead of 4 then it would through an error ValueError: array split does not result in an equal division
.
To Produce Larger arrays, repeat() and tile() functions are used.
repeat replicates each element in an array some number of times, producing a larger array.
arr9= np.arange(4) arr9
Output:
array([0, 1, 2, 3])
Repeat every element 2 times
arr9.repeat(2)
Output:
array([0, 0, 1, 1, 2, 2, 3, 3])
Now
arr9.repeat([2,3,5,6]) # repeat first element 2 times, second element 3 times , third element 5 times and fourth element 6 time
Output:
array([0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3])
For multidimensional Array
arr10= randn(2,2) arr10
Output:
array([[-0.31851426, 2.01991533], [-0.13367267, 1.00630383]])
Repeat element which is on axis=0
arr10.repeat(2,axis=0)
Output:
array([[-0.31851426, 2.01991533], [-0.31851426, 2.01991533], [-0.13367267, 1.00630383], [-0.13367267, 1.00630383]])
Repeat element which is on axis=1
arr10.repeat(2,axis=1)
Output:
array([[-0.31851426, -0.31851426, 2.01991533, 2.01991533], [-0.13367267, -0.13367267, 1.00630383, 1.00630383]])
Similarly you can pass an array of integers when repeating a multidimensional array to repeat a given slice a different number of times:
arr10.repeat([2,4],axis=0) # which row to be repeated(2nd) and the no of times(4 time) the row to be repeated
Output:
array([[ 1.1676876 , 0.73187928], [ 1.1676876 , 0.73187928], [ 1.19650763, -0.85104547], [ 1.19650763, -0.85104547], [ 1.19650763, -0.85104547], [ 1.19650763, -0.85104547]])
For Columns
arr10.repeat([1,3],axis=1) # same for columns( 1st column repeated 3 times)
Output:
array([[ 1.1676876 , 0.73187928, 0.73187928, 0.73187928], [ 1.19650763, -0.85104547, -0.85104547, -0.85104547]])
Now
arr10.repeat([0,3], axis=1) #here it will repeat column=0 and discard column 1 in output
Output:
array([[ 0.73187928, 0.73187928, 0.73187928], [-0.85104547, -0.85104547, -0.85104547]])
tile on the other hand, is a shortcut for stacking copies of an array along an axis.
arr10
Output:
array([[ 1.1676876 , 0.73187928], [ 1.19650763, -0.85104547]])
Now
np.tile(arr10,2) # the second argument is the number of tiles ( a scalar or a tuple). When the second argument to tile is tuple it indicate the layout of the tiling.
Output:
array([[ 1.1676876 , 0.73187928, 1.1676876 , 0.73187928], [ 1.19650763, -0.85104547, 1.19650763, -0.85104547]])
The tiling is made row by row, rather than column by column.
np.tile(arr10,(2,2)) #row repeated 2 times, column repeated 2 times.
Output:
array([[ 1.1676876 , 0.73187928, 1.1676876 , 0.73187928], [ 1.19650763, -0.85104547, 1.19650763, -0.85104547], [ 1.1676876 , 0.73187928, 1.1676876 , 0.73187928], [ 1.19650763, -0.85104547, 1.19650763, -0.85104547]])
another example:
np.tile(arr10,(3,2)) # row repeated 3 times, column repeated 2 times.
Output:
array([[ 1.1676876 , 0.73187928, 1.1676876 , 0.73187928], [ 1.19650763, -0.85104547, 1.19650763, -0.85104547], [ 1.1676876 , 0.73187928, 1.1676876 , 0.73187928], [ 1.19650763, -0.85104547, 1.19650763, -0.85104547], [ 1.1676876 , 0.73187928, 1.1676876 , 0.73187928], [ 1.19650763, -0.85104547, 1.19650763, -0.85104547]])
Broadcasting describes how arithmetic works between arrays of different shapes. It is a very powerful feature.
arr3
Output:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])
Simple broadcasting:
arr3*6
Output:
array([ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96, 102])
Here the scalar value 6 has been broadcast to all of other elements in the multiplication operation.
Two arrays are compatible for broadcasting if for each trailing dimension, (that is, starting from the end ) the axis lengths match or if either of the lengths is 1. Broadcasting is then performed over the missing and / or length 1 dimensions.
arr11= randn(2,3) arr11
Output:
array([[ 0.15897943, 0.21914698, 1.16865214], [-0.1779596 , -1.55740588, -1.40536907]])
another numpy array arr12= randn(6,2) arr12
Output:
array([[ 0.61632769, 1.43608023], [ 0.12357505, -0.70171185], [-0.32430931, -0.15066942], [-1.14900472, -0.24824113], [-1.25776788, -1.55733187], [-0.73411425, 0.76859758]])
Now try this
arr11 + arr12
this will give ValueError: operands could not be broadcast together with shapes (2,3) (6,2) so We have to reshape the arr11 before addition. As according to Broadcasting rule to make addition along axis-1 the smaller array should have a shape of (6,1).
print arr11.reshape(6,1)+ arr12
Output:
[[-0.52834331 -1.57587648] [-2.0897029 -1.69842965] [-0.56482399 -0.74702429] [-1.02085944 -0.45111874] [-0.86002779 0.90505302] [ 0.29465825 0.86799132]]
Check
arr12
Output:
array([[ 0.61632769, 1.43608023], [ 0.12357505, -0.70171185], [-0.32430931, -0.15066942], [-1.14900472, -0.24824113], [-1.25776788, -1.55733187], [-0.73411425, 0.76859758]])
Check the mean along axis=0
arr12.mean(0)
Output:
array([-0.45421557, -0.07554607])
Subtraction in Broadcasting:
arr12 - arr12.mean(0)
Output:
array([[ 1.07054326, 1.51162631], [ 0.57779062, -0.62616578], [ 0.12990626, -0.07512334], [-0.69478915, -0.17269506], [-0.80355231, -1.48178579], [-0.27989868, 0.84414366]])
Boolean operation in Broadcasting:
arr12[arr12 > 0]
Output:
array([ 0.61632769, 1.43608023, 0.12357505, 0.76859758])
Check the mean along axis=1
arr12.mean(1)
Output:
array([ 1.02620396, -0.2890684 , -0.23748936, -0.69862292, -1.40754987, 0.01724166])
Now Check this
arr12 - arr12.mean(1)
By doing this will through a ValueError: operands could not be broadcast together with shapes (6,2) (6,) because According to the rules, to subtract over axis 1 (that is, subtract the row mean from each row), the smaller array must have shape (6, 1).
so Let's reshape the mean
arr12 - arr12.mean(1).reshape(6,1)
Output:
array([[-0.40987627, 0.40987627], [ 0.41264345, -0.41264345], [-0.08681994, 0.08681994], [-0.45038179, 0.45038179], [ 0.14978199, -0.14978199], [-0.75135592, 0.75135592]])
Note : for Broadcasting purpose, numpy.newaxis() method can be used.
To know more about broadcasting EricBroadCastingDoc
Corrections and suggestions are welcome in the comments.
Published On : 2016-12-15 Tweet