Python Data Analysis(Second Edition)
上QQ阅读APP看书,第一时间看更新

The NumPy array object

NumPy provides a multidimensional array object called ndarray. NumPy arrays are typed arrays of a fixed size. Python lists are heterogeneous and thus elements of a list may contain any object type, while NumPy arrays are homogenous and can contain objects of only one type. An ndarray consists of two parts, which are as follows:

  • The actual data that is stored in a contiguous block of memory
  • The metadata describing the actual data

Since the actual data is stored in a contiguous block of memory, hence loading of the large dataset as ndarray, it is affected by the availability of a large enough contiguous block of memory. Most of the array methods and functions in NumPy leave the actual data unaffected and only modify the metadata.

We have already discovered in the preceding chapter how to produce an array by applying the arange() function. Actually, we made a one-dimensional array that held a set of numbers. The ndarray can have more than a single dimension.

Advantages of NumPy arrays

The NumPy array is, in general, homogeneous (there is a particular record array type that is heterogeneous), the items in the array have to be of the same type. The advantage is that if we know that the items in an array are of the same type, it is easy to ascertain the storage size needed for the array. NumPy arrays can execute vectorized operations, processing a complete array, in contrast to Python lists, where you usually have to loop through the list and execute the operation on each element. NumPy arrays are indexed from 0, just like lists in Python. NumPy utilizes an optimized C API to make the array operations particularly quick.

We will make an array with the arange() subroutine again. In this chapter, you will see snippets from Jupyter Notebook sessions where NumPy is already imported with the instruction import numpy as np. Here's how to get the data type of an array:

In: a = np.arange(5) 
In: a.dtype 
Out: dtype('int64')

The data type of the array a is int64 (at least on my computer), but you may get int32 as the output if you are using 32 bit Python. In both the cases, we are dealing with integers (64 bit or 32 bit). Besides the data type of an array, it is crucial to know its shape. The example in Chapter 1, Getting Started with Python Libraries, demonstrated how to create a vector (actually, a one-dimensional NumPy array). A vector is commonly used in mathematics but most of the time we need higher-dimensional objects. Let's find out the shape of the vector we produced a few minutes ago:

In: a 
Out: array([0, 1, 2, 3, 4]) 
In: a.shape 
Out: (5,) 

As you can see, the vector has five components with values ranging from 0 to 4. The shape property of the array is a tuple; in this instance, a tuple of 1 element, which holds the length in each dimension.