Home How do you split a list into evenly sized chunks?

How do you split a list into evenly sized chunks?

jespern
1#
jespern Published in 2008-11-23 12:15:52Z
 I have a list of arbitrary length, and I need to split it up into equal size chunks and operate on it. There are some obvious ways to do this, like keeping a counter and two lists, and when the second list fills up, add it to the first list and empty the second list for the next round of data, but this is potentially extremely expensive. I was wondering if anyone had a good solution to this for lists of any length, e.g. using generators. I was looking for something useful in itertools but I couldn't find anything obviously useful. Might've missed it, though. Related question: What is the most “pythonic” way to iterate over a list in chunks?
SwiftsNamesake
2#
 Here's a generator that yields the chunks you want: def chunks(l, n): """Yield successive n-sized chunks from l.""" for i in range(0, len(l), n): yield l[i:i + n]  import pprint pprint.pprint(list(chunks(range(10, 75), 10))) [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74]]  If you're using Python 2, you should use xrange() instead of range(): def chunks(l, n): """Yield successive n-sized chunks from l.""" for i in xrange(0, len(l), n): yield l[i:i + n]  Also you can simply use list comprehension instead of writing a function. Python 3: [l[i:i + n] for i in range(0, len(l), n)]  Python 2 version: [l[i:i + n] for i in xrange(0, len(l), n)] 
ThiefMaster
3#
 If you know list size: def SplitList(list, chunk_size): return [list[offs:offs+chunk_size] for offs in range(0, len(list), chunk_size)]  If you don't (an iterator): def IterChunks(sequence, chunk_size): res = [] for item in sequence: res.append(item) if len(res) >= chunk_size: yield res res = [] if res: yield res # yield the last, incomplete, portion  In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).
ThiefMaster
4#
 Here is a generator that work on arbitrary iterables: def split_seq(iterable, size): it = iter(iterable) item = list(itertools.islice(it, size)) while item: yield item item = list(itertools.islice(it, size))  Example: >>> import pprint >>> pprint.pprint(list(split_seq(xrange(75), 10))) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74]] 
slav0nic
5#
 heh, one line version In [48]: chunk = lambda ulist, step: map(lambda i: ulist[i:i+step], xrange(0, len(ulist), step)) In [49]: chunk(range(1,100), 10) Out[49]: [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [61, 62, 63, 64, 65, 66, 67, 68, 69, 70], [71, 72, 73, 74, 75, 76, 77, 78, 79, 80], [81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [91, 92, 93, 94, 95, 96, 97, 98, 99]] 
tzot
6#
 Directly from the (old) Python documentation (recipes for itertools): from itertools import izip, chain, repeat def grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)  The current version, as suggested by J.F.Sebastian: #from itertools import izip_longest as zip_longest # for Python 2.x from itertools import zip_longest # for Python 3.x #from six.moves import zip_longest # for both (uses the six compat library) def grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)  I guess Guido's time machine works—worked—will work—will have worked—was working again. These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.
Corey Goldberg
7#
Corey Goldberg Reply to 2008-11-24 16:56:57Z
 def split_seq(seq, num_pieces): start = 0 for i in xrange(num_pieces): stop = start + len(seq[i::num_pieces]) yield seq[start:stop] start = stop  usage: seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] for seq in split_seq(seq, 3): print seq 
dbr
8#
 def chunk(lst): out = [] for x in xrange(2, len(lst) + 1): if not len(lst) % x: factor = len(lst) / x break while lst: out.append([lst.pop(0) for x in xrange(factor)]) return out 
hcvst
9#
 >>> f = lambda x, n, acc=[]: f(x[n:], n, acc+[(x[:n])]) if x else acc >>> f("Hallo Welt", 3) ['Hal', 'lo ', 'Wel', 't'] >>>  If you are into brackets - I picked up a book on Erlang :)
Shubham Chaudhary
10#
Shubham Chaudhary Reply to 2016-09-12 10:37:52Z
 If you want something super simple: def chunks(l, n): n = max(1, n) return (l[i:i+n] for i in xrange(0, len(l), n)) 
parity3
11#
 Without calling len() which is good for large lists: def splitter(l, n): i = 0 chunk = l[:n] while chunk: yield chunk i += n chunk = l[i:i+n]  And this is for iterables: def isplitter(l, n): l = iter(l) chunk = list(islice(l, n)) while chunk: yield chunk chunk = list(islice(l, n))  The functional flavour of the above: def isplitter2(l, n): return takewhile(bool, (tuple(islice(start, n)) for start in repeat(iter(l))))  OR: def chunks_gen_sentinel(n, seq): continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n)) return iter(imap(tuple, continuous_slices).next,())  OR: def chunks_gen_filter(n, seq): continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n)) return takewhile(bool,imap(tuple, continuous_slices)) 
ThiefMaster
12#
 def chunk(input, size): return map(None, *([iter(input)] * size)) 
Antwane
13#
 Simple yet elegant l = range(1, 1000) print [l[x:x+10] for x in xrange(0, len(l), 10)]  or if you prefer: chunks = lambda l, n: [l[x: x+n] for x in xrange(0, len(l), n)] chunks(l, 10) 
ninjagecko
14#
 If you had a chunk size of 3 for example, you could do: zip(*[iterable[i::3] for i in range(3)])  source: http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/ I would use this when my chunk size is fixed number I can type, e.g. '3', and would never change.
Brian Schwartz
15#
Brian Schwartz Reply to 2012-03-08 18:27:15Z
 Consider using matplotlib.cbook pieces for example: import matplotlib.cbook as cbook segments = cbook.pieces(np.arange(20), 3) for s in segments: print s 
robert king
16#
robert king Reply to 2012-02-13 04:50:38Z
 def chunks(iterable,n): """assumes n is an integer>0 """ iterable=iter(iterable) while True: result=[] for i in range(n): try: a=next(iterable) except StopIteration: break else: result.append(a) if result: yield result else: break g1=(i*i for i in range(10)) g2=chunks(g1,3) print g2 '' print list(g2) '[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]' 
Shawn Zhang
17#
Shawn Zhang Reply to 2012-08-27 22:58:05Z
 I realise this question is old (stumbled over it on Google), but surely something like the following is far simpler and clearer than any of the huge complex suggestions and only uses slicing: def chunker(iterable, chunksize): for i,c in enumerate(iterable[::chunksize]): yield iterable[i*chunksize:(i+1)*chunksize] >>> for chunk in chunker(range(0,100), 10): ... print list(chunk) ... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] [20, 21, 22, 23, 24, 25, 26, 27, 28, 29] ... etc ... 
BomberMan
18#
 See this reference >>> orange = range(1, 1001) >>> otuples = list( zip(*[iter(orange)]*10)) >>> print(otuples) [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)] >>> olist = [list(i) for i in otuples] >>> print(olist) [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]] >>>  Python3
jamylak
19#
 A generator expression: def chunks(seq, n): return (seq[i:i+n] for i in xrange(0, len(seq), n))  eg. print list(chunks(range(1, 1000), 10)) 
SiggyF
20#
 more-itertools has a chunks iterator. It also has a lot more things, including all the recipes in the itertools documentation.
BomberMan
21#
 using List Comprehensions of python [range(t,t+10) for t in range(1,1000,10)] [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],.... ....[981, 982, 983, 984, 985, 986, 987, 988, 989, 990], [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]  visit this link to know about List Comprehensions
Moj
22#
 I know this is kind of old but I don't why nobody mentioned numpy.array_split: lst = range(50) In [26]: np.array_split(lst,5) Out[26]: [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]), array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]), array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]), array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])] 
Moss
23#
 Not exactly the same but still nice def chunks(l, chunks): return zip(*[iter(l)]*chunks) l = range(1, 1000) print chunks(l, 10) -> [ ( 1..10 ), ( 11..20 ), .., ( 991..999 ) ] 
balki
24#
 Works with any iterable Inner data is generator object (not a list) One liner In [259]: get_in_chunks = lambda itr,n: ( (v for _,v in g) for _,g in itertools.groupby(enumerate(itr),lambda (ind,_): ind/n)) In [260]: list(list(x) for x in get_in_chunks(range(30),7)) Out[260]: [[0, 1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13], [14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27], [28, 29]] 
rectangletangle
25#
 def chunked(iterable, size): chunk = () for item in iterable: chunk += (item,) if len(chunk) % size == 0: yield chunk chunk = () if chunk: yield chunk 
nikipore
26#
 I like the Python doc's version proposed by tzot and J.F.Sebastian a lot, but it has two shortcomings: it is not very explicit I usually don't want a fill value in the last chunk I'm using this one a lot in my code: from itertools import islice def chunks(n, iterable): iterable = iter(iterable) while True: yield tuple(islice(iterable, n)) or iterable.next()  UPDATE: A lazy chunks version: from itertools import chain, islice def chunks(n, iterable): iterable = iter(iterable) while True: yield chain([next(iterable)], islice(iterable, n-1)) 
zach
27#
 The toolz library has the partition function for this: from toolz.itertoolz.core import partition list(partition(2, [1, 2, 3, 4])) [(1, 2), (3, 4)] 
koffein
28#
 Yes, it is an old question, but I had to post this one, because it is even a little shorter than the similar ones. Yes, the result looks scrambled, but if it is just about even length... >>> n = 3 # number of groups >>> biglist = range(30) >>> >>> [ biglist[i::n] for i in xrange(n) ] [[0, 3, 6, 9, 12, 15, 18, 21, 24, 27], [1, 4, 7, 10, 13, 16, 19, 22, 25, 28], [2, 5, 8, 11, 14, 17, 20, 23, 26, 29]] 
Aaron Hall
29#
Aaron Hall Reply to 2014-02-26 16:11:00Z

None of these answers are evenly sized chunks, they all leave a runt chunk at the end, so they're not completely balanced. If you were using these functions to distribute work, you've built-in the prospect of one likely finishing well before the others, so it would sit around doing nothing while the others continued working hard.

For example, the current top answer ends with:

[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]


I just hate that runt at the end!

Others, like list(grouper(3, xrange(7))), and chunk(xrange(7), 3) both return: [(0, 1, 2), (3, 4, 5), (6, None, None)]. The None's are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables.

Why can't we divide these better?

My Solution(s)

Here's a balanced solution, adapted from a function I've used in production (Note in Python 3 to replace xrange with range):

def baskets_from(items, maxbaskets=25):
baskets = [[] for _ in xrange(maxbaskets)] # in Python 3 use range
for i, item in enumerate(items):


And I created a generator that does the same if you put it into a list:

def iter_baskets_from(items, maxbaskets=3):
'''generates evenly balanced baskets from indexable iterable'''
item_count = len(items)
yield [items[y_i] for y_i in xrange(x_i, item_count, baskets)]


And finally, since I see that all of the above functions return elements in a contiguous order (as they were given):

def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
'''
generates balanced baskets from iterable, contiguous contents
provide item_count if providing a iterator that doesn't support len()
'''
item_count = item_count or len(items)
items = iter(items)
ceiling = floor + 1
length = ceiling if x_i < stepdown else floor
yield [items.next() for _ in xrange(length)]


Output

To test them out:

print(baskets_from(xrange(6), 8))


Which prints out:

[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]


Notice that the contiguous generator provide chunks in the same length patterns as the other two, but the items are all in order, and they are as evenly divided as one may divide a list of discrete elements.

senderle
30#
 I'm surprised nobody has thought of using iter's two-argument form: from itertools import islice def chunk(it, size): it = iter(it) return iter(lambda: tuple(islice(it, size)), ())  Demo: >>> list(chunk(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]  This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice: from itertools import islice, chain, repeat def chunk_pad(it, size, padval=None): it = chain(iter(it), repeat(padval)) return iter(lambda: tuple(islice(it, size)), (padval,) * size)  Demo: >>> list(chunk_pad(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)] >>> list(chunk_pad(range(14), 3, 'a')) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]  Like the izip_longest-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close: _no_padding = object() def chunk(it, size, padval=_no_padding): if padval == _no_padding: it = iter(it) sentinel = () else: it = chain(iter(it), repeat(padval)) sentinel = (padval,) * size return iter(lambda: tuple(islice(it, size)), sentinel)  Demo: >>> list(chunk(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)] >>> list(chunk(range(14), 3, None)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)] >>> list(chunk(range(14), 3, 'a')) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]  I believe this is the shortest chunker proposed that offers optional padding.