The basics of memory management in Python for data scientists
As data scientists, normally, we don’t pay attention to how Python and the underlying operating system handle memory for our code. After all, Python is the most popular language among data scientists, partly because it automatically handles those details. As long as we are working on small datasets, ignoring how Python manages memory (i.e., memory allocation and deallocation) does not impact our code performance. But, as soon as we switch to large datasets (big data) or heavy processing projects, basic knowledge about memory management becomes crucial.
As an example, I was working on a data science project regarding indexing human DNA. I used a python dictionary object to keep track of sequences (i.e., sequences of nucleotides) and store their location in a reference human DNA. About 10% into the process, the dictionary object took all my RAM and started swapping between disk and RAM. It made the process super slow (as the disk is much slower in data transmission). As a data scientist, if I knew the basics of Python and memory management, I could prevent it and make much more memory-efficient codes.
In this article and an upcoming article, I explain some basic concepts around memory management in Python. At the end of this article, you have good basic knowledge of how Python handles memory allocation and deallocation. Let’s get started …
A python program is a collection of
Methods or operations are easy. When you add two numbers, you are basically applying the add (or sum) method to two values. References are a little bit tricky to explain. A reference is a name that we use to access a data value (i.e., an object). The most famous references in programming are variables. When you define
x = 1 ,
x is the variable or reference and
1 is its value (more accurate an integer object). In addition to variables, attributes and items are two other popular references in programming.
Now, let's get deeper and introduce objects. As a Python programmer, you must have heard that…
Determining the Size of Objects
In the above example, we imported the sys module and created an integer object 'a'. We then printed the size of 'a' using the 'sys. getsizeof()' function, which returned 28. This means the integer object 'a' takes up 28 bytes in memory.
In python, the usage of sys. getsizeof() can be done to find the storage size of a particular object that occupies some space in the memory. This function returns the size of the object in bytes.How do I overcome memory error in Python? ›
To fix this error, you can reduce the size of your dataset, use chunking, use Dask, or use a larger machine with more memory. By following these solutions, you can ensure that your data analysis projects run smoothly and efficiently, even with very large datasets.Why does my Python program use so much memory? ›
Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte objects. In fact, Python uses more like 35MB of RAM to store these numbers. Why? Because Python integers are objects, and objects have a lot of memory overhead.What is the maximum size of an object in Python? ›
It is the Python platform's pointer that dictates the maximum size of lists and strings in Python. The size value returned by maxsize depends on the platform architecture: 32-bit: the value will be 2^31 – 1, i.e. 2147483647. 64-bit: the value will be 2^63 – 1, i.e. 9223372036854775807.How much memory does an object take? ›
Object references consume 4 bytes. boolean and byte values consume 1 byte. short and char values consume 2 bytes. int and float values consume 4 bytes.How do I see how much memory Python is using? ›
The function psutil. virutal_memory() returns a named tuple about system memory usage. The third field in the tuple represents the percentage use of the memory(RAM). It is calculated by (total – available)/total * 100 .How to check memory usage in Python? ›
Try Python profiler mprof for memory usage
mprof can show you memory usage over the lifetime of your application. This can be useful if you want to see if your memory is getting cleaned up and released periodically.
- Method 1: Using Tracemalloc. Tracemalloc is a library module that traces every memory block in python. ...
- Output: ...
- Method 2: Using Psutil. ...
- Output: func: consumed memory: 307,261,440.
- Method 3: Using the classic memory profiler. ...
One of the most common techniques for optimizing memory usage in Python is to use generators and iterators. Generators and iterators allow developers to create sequences of data on the fly, without storing the entire sequence in memory.
Python doesn't limit memory usage on your program. It will allocate as much memory as your program needs until your computer is out of memory. The most you can do is reduce the limit to a fixed upper cap. That can be done with the resource module, but it isn't what you're looking for.Can you manually manage memory in Python? ›
In languages like C or Rust, memory management is the responsibility of the programmer. The programmer has to manually allocate memory before it can be used by the program and release it when the program no longer needs it. In Python, memory management is automatic!Does Python automatically clear memory? ›
Garbage collection is to release memory when the object is no longer in use. This system destroys the unused object and reuses its memory slot for new objects. You can imagine this as a recycling system in computers. Python has an automated garbage collection.Does RAM affect Python performance? ›
If data has to be stored on disk rather than in RAM or fast caches, then it will take a while to load and get processed, impacting overall performance. Therefore, optimizing for memory usage might have a nice side effect of speeding-up the application runtime.Are Python objects stored in memory? ›
Objects in Python
Everything in Python is an object. Classes, functions, and even simple data types, such as integers, floats, and strings, are objects in Python. When we define an integer in Python, CPython internally creates an object of type integer. These objects are stored in heap memory.
An empty list takes 56 bytes, but each additional int adds just 8 bytes, where the size of an int is 28 bytes. A list that contains a long string takes just 64 bytes.What is the memory limit of Python list? ›
- source Therefore the maximum size of a python list on a 32 bit system is 536,870,912 elements. ...
- A 32-bit process has a theoretical limit of 4 GB of memory, though if your OS is also 32-bit it will obviously be less since the OS will take up some of that memory.