This article was published as a part of the Data Science Blogathon.
1. Introduction:
Python is a high-level, interpreted, and general-purpose programming language. You must be thinking, “do I really need to manage the memory in a high-level language like python? The most upvoted answer is ‘No’, you do not have to take care of memory management in python, but yes, you should be aware of how variables and objects are managed internally. Having a good understanding of how chunks of memory are allocated, re-used, and de-allocated for python objects enables you to write more efficient code and solve a lot of issues related to extra memory that your program pulls.
Python memory management plays a major role to make it much popular and adaptable. How so? Python memory manager has been implemented in a way to support many functionalities and to make our life easier. ‘Dynamically typed’ is the best example to mention here. Python allows you to create variables without type information and going forward, you can also assign another object irrespective of the size and type of the new object.
Did you ever wonder how is this possible and how python handles it internally? Let deep dive into it.
2. Python does not have variables, instead, it has ‘names’:
If you are coming from C/C++ or Java, you would be aware that you must declare a variable with its type before we can use it. Based on the type specified, it reserves some (fixed in size) space in the memory with a default value, and then the value is stored during the assignment. Also, we know that a variable in C/C++ or java will override its content in the same reserved memory block once a new value is assigned to it. If we assign a larger value, overflow may occur.
Python works in a different way and technically, it does not have anything like ‘variables’, instead, it uses ‘names’. Please note that –
· Python ‘name’ is just a label for an object and used to refer to a value.
· We never specify ‘type’ information while creating an object.
· A python ‘name’ can change its type.
· A single Python object can have lots of names.
· Two names will point to the same object if the id(x) method returns the same value.
3. Values are not updated, instead, a new object is pointed:
You can see, both names ‘x’ and ‘y’ are pointing to the same object. So, what happens to ‘y’ if we change the value of ‘x’? will it return the updated value? Absolutely not! If we change the value of ‘x’, the memory manager will search if an object equivalent to the updated value is already present in the heap? If yes, then it starts pointing to it, otherwise, a new object with an updated value is created –
Python optimizes memory utilization by allocating the same object reference to a new variable if the object already exists with the same value. That is why python is called more memory efficient.
4. So, where is the ‘type’?
Unlike C/C++ or Java, python names neither point to a specific ‘memory location’, nor fix in ‘type’. We have already seen, a python name will start referring to another object once it is updated, or a new value is assigned. Similarly, it inherits the ‘type’ information from the object it currently refers to. We can get the type of a python name by calling type(x) method.
5. How is memory allocated to new Objects?
Python uses the Dynamic Memory Allocation (DMA), which is internally managed by the Heap data structure. All python objects are stored in a private heap, and this heap is managed in such a way that you have zero control over it. Let us get some more details about DMA and compare it with SMA –
Static memory allocation | Dynamic memory Allocation |
Memory is allocated at compile time. | Memory is allocated while the program starts executing. |
It is a faster way of memory allocation. | It is a slower way of memory allocation. |
Once static memory is allocated, neither its size can be changed, nor it can be re-used. Hence, less efficient. | We can change the memory size after allocation and can be reused as well. Hence, more efficient. |
In this case, variables get allocated permanently, and allocated memory remains blocked until the program terminates. | In this case, variables get allocated only when the program unit gets active and releases the memory when variables get out of scope. |
Uses stack for memory management. | Uses heap for memory management. |
6. A Python Object:
Python is an Object-Oriented Language and everything in python is an object. All python objects always derived from ‘PyObject‘, which is just like a key-value container and it contains below 3 fields –
· type
· ref-count
· value
Python uses a CPython interpreter, which is written in C. When we create a python object (X=200), internally –
· CPython creates a PyObject in memory.
· The type of PyObject is set to an integer.
· The value of PyObject is set to 200.
· A name ‘X’ is created and set to point PyObject.
· PyObject ref-count is set to 1.
Python objects can be divided into 3 parts –
· Simple objects (Integer, Float, Boolean, String, etc.)
· Container objects (Tuple, List, Set, Dictionary, etc.)
· User-defined custom classes (Employee class etc.)
7. How objects are removed from memory: Garbage Collection
We have understood, a python ‘name’ can start pointing to another (new or existing) object (same or different type), when we update its value or assign a new instance to it. But what happens to the older object, which was being referred to by this ‘name’? Will that still available in memory? The answer is – “May or may not be”!
Python is a dynamically typed language and dynamically allocates the memory to its objects when a chunk of the program starts its execution. Similarly, Python also de-allocates the memory occupied by unused objects using “Garbage Collection”. When there are no more references available, the object can be safely removed from memory. Python uses the below 2 algorithms to perform garbage collection –
· Reference Counting
· Tracing
7.1. Reference Counting:
Python keeps tracking of all the names (references), currently pointing to a particular object. The total number of names referring to an object is called the ‘reference count’ of the object (PyObject) and Python keeps this count to ‘ref-count’ field.
Please note –
· The reference count of an object can increase or decrease dynamically.
· We can call a sys.getrefcount(X) to get the current ref-count value of an object ‘X’.
· Passing x to getrefcount() function adds one extra reference to it.
Every time, when a new reference to a Python object is created, its reference count is increased. Similarly, every time, when a reference to a Python object is removed, its reference count is decreased. When a reference count reaches 0, we can safely remove the object from memory.
7.1.1 What makes a reference?
Below are some cases, when the reference of a python object (new/existing PyObject) is made, and ref-count is increased by 1 –
· Binding a new object to a name:
x = 200
· Re-using an already available object and giving a new reference:
y = 200
· Adding an object to a container:
z = [200, 200]
· Passing it to a function:
my_fun(200)
7.1.2 What removes a reference?
Look at below cases, when a reference to an object (existing PyObject) is removed, and ref-counter is decreased by 1 –
· Assigning a new object to a name:
x = True
· Removing the reference or its container:
del y
· When a variable goes out of scope:
Once a local variable is loaded using a block or method, it gets some memory dynamically. Once the block completed its execution, the variable is called ‘out of scope’.
Please note, Global namespace never gets out of scope and stays alive until the program completes its execution.
7.1.3 Cascading effect:
If a removed object O1 was pointing to another object O2, the reference count of O2 will also be decreased by 1 and if the ref-count of O2 reaches 0, O2 can also be removed from memory. It means, one reference count reaching 0 can cause a lot of objects to be cleared from memory. This is called cascading effect in Python garbage collection.
7.1.4 Reference counting has good, bad, and ugly sides:
Good: The Good part of the reference counting algorithm is, that –
· It is easy to implement.
· It immediately clears the memory once the ref-count reaches 0.
Bad: Reference counting can be called bad as it is –
· This algorithm has some space overhead involved, as it needs some extra space for each object to store count values.
· It also has execution overhead, as the reference count is changed at every assignment. Hence, a single assignment operator can cause too many executions internally.
Ugly: Reference counting sometimes shows serious issues like –
· It is generally not thread-safe, and it may create lots of issues when multiple threads are trying to update reference count at the same time.
· Reference counting does not detect cyclical reference.
In the given diagram (Figure 1.10), there are 3 references ‘x’, ‘y’, and ‘z’ are creating a cyclical reference, and another reference ‘A’ is pointing to ‘y’. Once ‘A’ started referring to a new object, ‘x’, ‘y’, and ‘z’ are no more required in the memory. But, as the reference counts of these 3 variables are not zero, the garbage collector won’t remove them from the memory. Hence reference counting is not capable to handle this case.
7.2. Tracing:
Starting from Python 3, Python uses ‘Tracing’, along with reference counting to handle this type of situation. Tracing works on the ‘Mark and sweep’ principle, which is performed in two phases –
7.2.1 Mark Phase:
When the no of objects in memory reaches a max threshold value, it starts marking all reachable references using a reference graph, e.g., nodes r, 1, 4, 6, 7, and 8 will be masked reachable during the mark phase.
7.2.2 Sweep Phase:
Once all reachable references are identified, all remaining objects are removed out of memory in the sweep phase, e.g., nodes 2, 3, and 5 will be removed during the sweep phase.
8. Conclusion:
In this article, we understood how python internally manages its object to minimize memory uses. We have also seen that how python runs its garbage collector. Most of the garbage collection is done by reference counting only and programmers have no control over it. Though so much execution overhead during memory management makes it slower than other languages, still its is most widely used as it offers many more other useful features to the programmers.
9. References:
· https://docs.python.org/3/c-api/structures.html
· https://realpython.com/python-memory-management
· https://scoutapm.com/blog/python-memory-management
· https://www.youtube.com/watch?v=F6u5rhUQ6dU
· https://www.youtube.com/watch?v=54NWGAYhfbc
· https://en.wikipedia.org/wiki/Python_(programming_language)
https://www.geeksforgeeks.org/memory-management-in-python
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
Related
FAQs
How is memory managed in Python answer? ›
Memory management in Python involves the management of a private heap. A private heap is a portion of memory that is exclusive to the Python process. All Python objects and data structures are stored in the private heap. The operating system cannot allocate this piece of memory to another process.
What is memory management of Python explain? ›Overview. Memory management in Python involves a private heap containing all Python objects and data structures. The management of this private heap is ensured internally by the Python memory manager.
How is memory managed in Python interview questions? ›12 In Python, how is memory managed? The Python memory is primarily managed by Python private heap space. All Python objects and data structures are located in a private heap. The programmer does not have access to this private heap and interpreter takes care of this Python private heap.
How do I find out how much memory a Python program is using? ›Working with Python Memory Profiler
You can use it by putting the @profile decorator around any function or method and running python -m memory_profiler myscript. You'll see line-by-line memory usage once your script exits.
Avoid unnecessary duplication of data. Reusing variables and objects wherever possible can save memory. Additionally, consider using in-place operations or modifying data structures directly instead of creating new ones.
How do you prevent memory problems in Python? ›To avoid this type of memory leak, you can use a generator or iterator instead of creating a large list. A generator is a special type of function that generates values one at a time, rather than generating a whole list of values at once. To create a generator in Python, you can use the yield keyword instead of return.
How do you explain memory management? ›Memory management is the process of controlling and coordinating a computer's main memory. It ensures that blocks of memory space are properly managed and allocated so the operating system (OS), applications and other running processes have the memory they need to carry out their operations.
What is the main function of memory management? ›Memory management is the functionality of an operating system which handles or manages primary memory and moves processes back and forth between main memory and disk during execution. Memory management keeps track of each and every memory location, regardless of either it is allocated to some process or it is free.
How string is stored in memory in Python? ›Strings are stored as individual characters in a contiguous memory location. It can be accessed from both directions: forward and backward. Characters are nothing but symbols. Strings are immutable Data Types in Python, which means that once a string is created, it cannot be changed.
What is Python best answer? ›What is Python? Python is one of the most widely-used and popular programming languages, was developed by Guido van Rossum and released first on February 20, 1991. Python is a free and open-source language with a very simple and clean syntax which makes it easy for developers to learn Python.
Why is Python using so much memory? ›
Everything in Python is an object. For these objects to be useful, they need to be stored in the memory to be accessed. Before they can be stored in memory, a chunk of memory must first be allocated or assigned for each of them.
How memory is managed in Python and explain the use of a garbage collector? ›Python garbage collection algorithm is very useful to open up space in the memory. Garbage collection is implemented in Python in two ways: reference counting and generational. When the reference count of an object reaches 0, reference counting garbage collection algorithm cleans up the object immediately.
How do you control memory in Python? ›Memory allocation can be defined as allocating a block of space in the computer memory to a program. In Python memory allocation and deallocation method is automatic as the Python developers created a garbage collector for Python so that the user does not have to do manual garbage collection.
How much memory can Python handle? ›Python can use all of the memory allocated to it. The OS allocates the memory, and usually has limits per process, but there are commands to control those limits. ( 'ulimit' on unix, for example ).
What is the maximum memory size for Python process? ›Python doesn't limit memory usage on your program. It will allocate as much memory as your program needs until your computer is out of memory. The most you can do is reduce the limit to a fixed upper cap. That can be done with the resource module, but it isn't what you're looking for.
How the memory is managed? ›Memory management operates at three levels: hardware, operating system and program/application. The management capabilities at each level work together to optimize memory availability and efficiency. Memory management at the hardware level.
How is memory managed in Python stackoverflow? ›In stack memory - a stack frame is created whenever methods and variables are created. These stacks frames are destroyed automaticaly whenever functions/methods returns. Python has mechanism of Garbage collector, as soon as variables and functions returns, Garbage collector clear the dead objects.
How memory is managed in Python What does namespace mean in Python? ›And space is an address in the main memory associated with that object. We can define namespace as a collection of names associated with the address in the main memory. There are three types of namespaces in python - Built-in Namespace, Global Namespace, and Local Namespace.
How memory is managed in Python why garbage collector exists in Python? ›Garbage collection is to release memory when the object is no longer in use. This system destroys the unused object and reuses its memory slot for new objects. You can imagine this as a recycling system in computers. Python has an automated garbage collection.