Home Python splitting list to sublists at given start/end keywords
Reply: 5

Python splitting list to sublists at given start/end keywords

Leo Whitehead
1#
Leo Whitehead Published in 2018-02-14 09:40:36Z

If I were to have a list, say

lst = ['hello', 'foo', 'test', 'world', 'bar', 'idk']

I'd like to split it into a sublist with 'foo' and 'bar' as start and end keywords, so that I would get

lst = ['hello', ['foo', 'test', 'world', 'bar'], 'idk']

The way I am currently doing this is as follows.

def findLoop(t):   
    inds = [index for index, item in enumerate(t) if item in ["FOO", "BAR"]]
    centre = inds[(len(inds)/2)-1:(len(inds)/2)+1]
    newCentre = t[centre[0]:centre[1]+1]
    return t[:centre[0]] + [newCentre] + t[centre[1]+1:]

def getLoops(t):
    inds = len([index for index, item in enumerate(t) if item in ["FOO", "BAR"]])
    for i in range(inds):
        t = findLoop(t)
    return t

This looks a bit messy, but it works very well for nested start/end keywords, so sublists can be formed inside of sublists, but it does not work for multiple start/end keywords not being inside eachother. Being nested is not important yet, so any help would be appreciated.

Eric Duminil
2#
Eric Duminil Reply to 2018-02-14 12:57:22Z

One creative way would be to dump your list to a JSON string, add [ and ] where needed, and parse your JSON string back to a Python nested list:

import json
lst = ['hello', 'foo', 'test', 'world', 'bar', 'idk']
start_keywords = ['world', 'foo', 'test']
end_keywords = ['bar', 'idk', 'foo']
dump = json.dumps(lst)

for k in start_keywords:
    dump = dump.replace(f'"{k}"', f'["{k}"')

for k in end_keywords:
    dump = dump.replace(f'"{k}"', f'"{k}"]')

json.loads(dump)
# ['hello', ['foo'], ['test', ['world', 'bar'], 'idk']]
json.loads(dump)[2][1][0]
# 'world'

The advantage is that it's easy to follow, it works fine for arbitrary nested lists and it detects if the structure isn't correct. You need to make sure your words don't contain ", though.

Mark Tolonen
3#
Mark Tolonen Reply to 2018-02-14 09:59:33Z

One way using slicing:

>>> lst = ['hello', 'foo', 'test', 'world', 'bar', 'idk']
>>> a=lst.index('foo')
>>> b=lst.index('bar')+1
>>> lst[a:b] = [lst[a:b]]
>>> lst
['hello', ['foo', 'test', 'world', 'bar'], 'idk']
entropiae
4#
entropiae Reply to 2018-02-14 09:47:19Z

Using slicing, without support for nested lists:

>>> lst = ['hello', 'foo', 'test', 'world', 'bar', 'idk']
>>> start_idx = lst.index('foo')
>>> end_idx = lst.index('bar')
>>> lst[:start_idx] + [lst[start_idx:end_idx+1]] + lst[end_idx+1:]
['hello', ['foo', 'test', 'world', 'bar'], 'idk']
Anton vBR
5#
Anton vBR Reply to 2018-02-14 10:40:01Z

multiple start,ends (based on Mark Tolonen's answer)

lst = ['hello', 'foo', 'test', 'world', 'bar', 'idk','am']
t = [('foo','test'),('world','idk')]

def sublists(lst, t):
    for start,end in t:
        a=lst.index(start)
        b=lst.index(end)+1
        lst[a:b] = [lst[a:b]]
    return lst

print(sublists(lst,t)) 

Returns:

 ['hello', ['foo', 'test'], ['world', 'bar', 'idk'], 'am']
birdcolour
6#
birdcolour Reply to 2018-02-14 13:28:31Z

To get your code to achieve the desired results, you need to make the following changes:

  1. Slice indices must be integers. Your findLoop function fails on the second line if your test list has an odd length. Coerce the type of the slice indices to int to round down (as is required here)

    centre = inds[int(len(inds)/2)-1:int(len(inds)/2)+1]
    
  2. in is case sensitive.

    >>> 'foo' in ['FOO', 'BAR']
    False
    
  3. In getLoops, you only need to search for the first element in your pair, as findLoops sublists from a pair of words on each call.

    inds = len([index for index, item in enumerate(t) if item in ['foo']])
    

Try it online!


However, as you've noticed, your code is quite messy, and the other answers show how you can use list().index() to better effect.

If you'd like to further this to find nested sublists, that will require some more clarification on how you'd like this to behave. Consider the following problems:

  • sublisting ['foo', 'bar'], then ['test', 'world']

    • Should sublisting occur only on the initial list, or inside sublists too?
  • sublisting ['foo', 'world'], then ['test', 'bar']

    • How should matches on different levels of the list behave?
You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.36982 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO