Diving deep in the world of Big Data we always looking for better tools to explore , operate and modify data. Any tool we would like to use will come with some advantages and disadvantages during the programming process.It is very important to understand when and how it is better to use python tools.
Lists are one one of the main built-in data structures in Python that can contain values of various data types.One of the most common operations on lists is “for loop” that can be easily replaced with list comprehension. Lots of developers call using list comprehension “Pythonic way”.
Using “for loop” For Filtering List
Let’s demonstrate a simple loop operation on a random list of numbers. Having a list of integers, lets exclude the odd numbers. For this task we have to create new list which we will fill with odd numbers while looping through our original list:
# creating list of 50 milllion integersfifty_mln_list = list(range(50_000_000))def exclude_odd(fifty_mln_list):
for number in fifty_mln_list:
if number % 2 == 0:
Using built-in magic command “%%time” we can easily check how long it takes to execute the exclude_odd function on a list of ten million generated integer numbers:
exclude_odd(fifty_mln_list)>3.14 s ± 24.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
As we observe — it takes 5.96 seconds to complete one loop. Let’s move forward and test list comprehension on the same data.
Using “list comprehension” For Filtering List
return [number for number in fifty_mln_list if number%2 ==0]%%timeit
exclude_odd_comprehen(fifty_mln_list)>2.42 s ± 29.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
“For loop” is almost 50% slower than a list comprehension (3.4/2.42≈1.29). And we just reduced five lines of code to one line! Cleaner and faster code? Great!
Python has a built-in function that allows to process and transform all the items in an iterable without using an explicit for loop, a technique commonly known as mapping. map() is useful when you need to apply a transformation function to each item in an iterable and transform them into a new iterable.
But when we have certain conditions that need to be applied prior to executing the map function — we involve filtering of data.
Testing “map” and “filter” function
Let’s create and test function that will add to the power 2 each odd integer number from list of 50 million numbers.
result = map(lambda x: x**2, filter(lambda x: x%2 == 0, \
power_2_odd(fifty_mln_list)>11.7 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
As we observe — it takes 11.7 seconds to complete one loop. Which is pretty long time if we are planning to operate with large amount of data.
Let test the same data with the same conditions via new function that use list comprehension.
return [x**2 for x in hundred_mln_list if x%2 == 0]%%timeit
power_2_odd_compr(fifty_mln_list)>8.03 s ± 35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Using “map” and “filter” is about 70% slower than a list comprehension (11.7/8.03≈1.45). Telling us that list comprehension is not just more readable but also faster when we dealing with a-lot of data.
Avoid More Then Two Expressions in List Comprehensions
List comprehensions also support multiple “if” conditions. Multiple conditions at the same loop level are an implicit and expression. For example, say we want to filter a list of numbers to only even values grater than 4. Then we can use two following ways with “if” or “and” statement:
a = [1,2,3,4,5,6,7,8,9,10]
b = [x for x in a if x > 4 if x%2 == 0]
c = [x for x in a if x > 4 and x%2 == 0]
print(c == b)> True
We can specify condition at each level of looping after the for expression. Let’s filter a matrix so the only cells remaining are those divisible by 3 in rows that sum higher then 10. The code with list comprehension will be short but very difficult to read.
matrix = [[1,2,3],[4,5,6],[7,8,9]]
filtered = [[x for x in row if x%3 ==0]
for row in matrix if sum(row) >= 10]
print(filtered)> [, ]
We can save the number of lines but it is very difficult to understand code. It is better to avoid using more than two expressions in list comprehension. This could be two conditions, two loop or one condition and one loop.
List comprehension are works much faster than regular for loop or map and filter functions. It also make code more simple and readable.
It became very popular tool in Python, especially when we need to operate on list and return another list. But there is no way to “break” out of a list comprehension or put any comments inside of it.
List comprehension are with more than two expressions are very difficult to read and should be avoided.