Home How do I list all files of a directory?
Reply: 29

# How do I list all files of a directory?

duhhunjonn
1#
duhhunjonn Published in 2010-07-08 19:31:22Z
 How can I list all files of a directory in Python and add them to a list?
csano
2#
csano Reply to 2016-07-13 19:05:33Z
 import os os.listdir("somedirectory")  will return a list of all files and directories in "somedirectory".
Martin Thoma
3#
Martin Thoma Reply to 2015-11-22 06:56:17Z
 os.listdir() will get you everything that's in a directory - files and directories. If you want just files, you could either filter this down using os.path: from os import listdir from os.path import isfile, join onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]  or you could use os.walk() which will yield two lists for each directory it visits - splitting into files and dirs for you. If you only want the top directory you can just break the first time it yields from os import walk f = [] for (dirpath, dirnames, filenames) in walk(mypath): f.extend(filenames) break  And lastly, as that example shows, adding one list to another you can either use .extend() or >>> q = [1, 2, 3] >>> w = [4, 5, 6] >>> q = q + w >>> q [1, 2, 3, 4, 5, 6]  Personally, I prefer .extend()
kame
4#
kame Reply to 2017-09-16 16:49:20Z
 I prefer using the glob module, as it does pattern matching and expansion. import glob print(glob.glob("/home/adam/*.txt"))  Will return a list with the queried files: ['/home/adam/file1.txt', '/home/adam/file2.txt', .... ] 
shaji
5#
shaji Reply to 2012-07-25 10:25:54Z
 import dircache list = dircache.listdir(pathname) i = 0 check = len(list[0]) temp = [] count = len(list) while count != 0: if len(list[i]) != check: temp.append(list[i-1]) check = len(list[i]) else: i = i + 1 count = count - 1 print temp 
Vallentin
6#
Vallentin Reply to 2017-04-24 01:57:49Z
 Getting Full File Paths From a Directory and All Its Subdirectories import os def get_filepaths(directory): """ This function will generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames). """ file_paths = [] # List which will store all of the full filepaths. # Walk the tree. for root, directories, files in os.walk(directory): for filename in files: # Join the two strings in order to form the full filepath. filepath = os.path.join(root, filename) file_paths.append(filepath) # Add it to the list. return file_paths # Self-explanatory. # Run the above function and store its results in a variable. full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")  The path I provided in the above function contained 3 files— two of them in the root directory, and another in a subfolder called "SUBFOLDER." You can now do things like: print full_file_paths which will print the list: ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat'] If you'd like, you can open and read the contents, or focus only on files with the extension ".dat" like in the code below: for f in full_file_paths: if f.endswith(".dat"): print f  /Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat
Al Lelopath
7#
Al Lelopath Reply to 2015-01-14 18:25:57Z
 A one-line solution to get only list of files (no subdirectories): filenames = next(os.walk(path))[2]  or absolute pathnames: paths = [os.path.join(path,fn) for fn in next(os.walk(path))[2]] 
Cristian Ciupitu
8#
Cristian Ciupitu Reply to 2014-12-28 03:25:50Z
 # -** coding: utf-8 -*- import os import traceback print '\n\n' def start(): address = "/home/ubuntu/Desktop" try: Folders = [] Id = 1 for item in os.listdir(address): endaddress = address + "/" + item Folders.append({'Id': Id, 'TopId': 0, 'Name': item, 'Address': endaddress }) Id += 1 state = 0 for item2 in os.listdir(endaddress): state = 1 if state == 1: Id = FolderToList(endaddress, Id, Id - 1, Folders) return Folders except: print "___________________________ ERROR ___________________________\n" + traceback.format_exc() def FolderToList(address, Id, TopId, Folders): for item in os.listdir(address): endaddress = address + "/" + item Folders.append({'Id': Id, 'TopId': TopId, 'Name': item, 'Address': endaddress }) Id += 1 state = 0 for item in os.listdir(endaddress): state = 1 if state == 1: Id = FolderToList(endaddress, Id, Id - 1, Folders) return Id print start() 
Peter Mortensen
9#
Peter Mortensen Reply to 2017-05-28 23:17:27Z
 If you are looking for a Python implementation of find, this is a recipe I use rather frequently: from findtools.find_files import (find_files, Match) # Recursively find all *.sh files in **/usr/bin** sh_files_pattern = Match(filetype='f', name='*.sh') found_files = find_files(path='/usr/bin', match=sh_files_pattern) for found_file in found_files: print found_file  So I made a PyPI package out of it and there is also a GitHub repository. I hope that someone finds it potentially useful for this code.
Apogentus
10#
Apogentus Reply to 2014-10-07 18:30:34Z
 def list_files(path): # returns a list of names (with extension, without full path) of all files # in folder path files = [] for name in os.listdir(path): if os.path.isfile(os.path.join(path, name)): files.append(name) return files 
Cristian Ciupitu
11#
Cristian Ciupitu Reply to 2014-12-28 03:27:49Z
 Returning a list of absolute filepaths, does not recurse into subdirectories L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))] 
Community
12#
Community Reply to 2017-05-23 11:47:32Z
 I really liked adamk's answer, suggesting that you use glob(), from the module of the same name. This allows you to have pattern matching with *s. But as other people pointed out in the comments, glob() can get tripped up over inconsistent slash directions. To help with that, I suggest you use the join() and expanduser() functions in the os.path module, and perhaps the getcwd() function in the os module, as well. As examples: from glob import glob # Return everything under C:\Users\admin that contains a folder called wlp. glob('C:\Users\admin\*\wlp')  The above is terrible - the path has been hardcoded and will only ever work on Windows between the drive name and the \s being hardcoded into the path. from glob import glob from os.path import join # Return everything under Users, admin, that contains a folder called wlp. glob(join('Users', 'admin', '*', 'wlp'))  The above works better, but it relies on the folder name Users which is often found on Windows and not so often found on other OSs. It also relies on the user having a specific name, admin. from glob import glob from os.path import expanduser, join # Return everything under the user directory that contains a folder called wlp. glob(join(expanduser('~'), '*', 'wlp'))  This works perfectly across all platforms. Another great example that works perfectly across platforms and does something a bit different: from glob import glob from os import getcwd from os.path import join # Return everything under the current directory that contains a folder called wlp. glob(join(getcwd(), '*', 'wlp'))  Hope these examples help you see the power of a few of the functions you can find in the standard Python library modules.
SzieberthAdam
13#
SzieberthAdam Reply to 2017-02-03 18:08:27Z
 Since version 3.4 there are builtin iterators for this which are a lot more efficient than os.listdir(): pathlib: New in version 3.4. >>> import pathlib >>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]  According to PEP 428, the aim of the pathlib library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them. os.scandir(): New in version 3.5. >>> import os >>> [entry for entry in os.scandir('.') if entry.is_file()]  Note that os.walk() use os.scandir() instead of os.listdir() from version 3.5 and it's speed got increased by 2-20 times according to PEP 471. Let me also recommend reading ShadowRanger's comment below.
Rajat Garg
14#
Rajat Garg Reply to 2015-07-07 10:12:33Z
 import os lst=os.listdir(path)  os.listdir returns a list containing the names of the entries in the directory given by path.
worenga
15#
worenga Reply to 2015-09-14 13:03:04Z
 List all files in a directory: import os from os import path files = [x for x in os.listdir(directory_path) if path.isfile(directory_path+os.sep+x)]  Here, you get list of all files in a directory.
enedil
16#
enedil Reply to 2016-01-17 18:17:07Z
 Python 3.5 introduced new, faster method for walking through the directory - os.scandir(). Example: for file in os.scandir('/usr/bin'): line = '' if file.is_file(): line += 'f' elif file.is_dir(): line += 'd' elif file.is_symlink(): line += 'l' line += '\t' print("{}{}".format(line, file.name)) 
coanor
17#
coanor Reply to 2016-03-12 09:31:46Z
 If you care about performance, try scandir, for Python 2.x, you may need to install it manually. Examples: # python 2.x import scandir import sys de = scandir.scandir(sys.argv[1]) while 1: try: d = de.next() print d.path except StopIteration as _: break  This save a lot of time when you need to scan a huge directory, you do not need to buffer a huge list, just fetch one by one. And also you can do it recursively: def scan_path(path): de = scandir.scandir(path) while 1: try: e = de.next() if e.is_dir(): scan_path(e.path) else: print e.path except StopIteration as _: break 
Harun Ergül
18#
Harun Ergül Reply to 2016-03-23 10:09:45Z
 You should use os module for listing directory content.os.listdir(".") returns all the contents of the directory. We iterate over the result and append to the list. import os content_list = [] for content in os.listdir("."): # "." means current directory content_list.append(content) print content_list 
Sankar
19#
Sankar Reply to 2016-10-15 16:29:55Z
 By using os library. import os for root, dirs,files in os.walk("your dir path", topdown=True): for name in files: print(os.path.join(root, name)) 
neouyghur
20#
neouyghur Reply to 2016-11-11 12:48:24Z
 Use this function if you want to different file type or get full directory. import os def createList(foldername, fulldir = True, suffix=".jpg"): file_list_tmp = os.listdir(foldername) #print len(file_list_tmp) file_list = [] if fulldir: for item in file_list_tmp: if item.endswith(suffix): file_list.append(os.path.join(foldername, item)) else: for item in file_list_tmp: if item.endswith(suffix): file_list.append(item) return file_list 
shantanoo
21#
shantanoo Reply to 2017-05-17 15:35:49Z
 Using generators import os def get_files(search_path): for (dirpath, _, filenames) in os.walk(search_path): for filename in filenames: yield os.path.join(dirpath, filename) list_files = get_files('.') for filename in list_files: print(filename) 
Giovanni Gianni
22#
Giovanni Gianni Reply to 2018-02-14 05:34:57Z

# Get a list with the files

I have made also a short video here: Video

os.listdir(): get files in current dir (Python 3)

The simplest way to have the file in the current dir in Python 3 is this. It's really simple, use the os module and the listdir() function and you'll have the file in that dir (and eventual folders that are in the dir, but you will not have the file in the subdirectory, for that you can use walk - I will talk about it later).

>>> import os
>>> arr = os.listdir()
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']  Getting the full path name As you noticed, you don't have the full path of the file in the code above. If you need to have the absolute path, you can use another function of the os.path module called _getfullpathname, putting the file that you get from os.listdir() as an argument. There are other ways to have the full path, as we will check later. >>> import os >>> files_path = [os.path._getfullpathname(x) for x in os.listdir())] >>> files_path ['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']  os.listdir(): get files in current dir (Python 2) >>> import os >>> arr = os.listdir('.') >>> arr ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']


To go up in the directory tree

>>> # method 1
>>> x = os.listdir('..')

# method 2
>>> x= os.listdir('/')


get files: os.listdir() in a particular directory (Python 2 and 3)

>>> import os
>>> arr = os.listdir('F:\\python')
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']  Get files of a particular subdirectory with os.listdir() import os x = os.listdir("./content")  os.walk('.') - current directory >>> import os >>> arr = next(os.walk('.'))[2] >>> arr ['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']  glob module - all files import glob print(glob.glob("*")) out:['content', 'start.py']  next(os.walk('.')) and os.path.join('dir','file') >>> import os >>> arr = [] >>> for d,r,f in next(os.walk("F:\_python)): >>> for file in f: >>> arr.append(os.path.join(r,file)) ... >>> for f in arr: >>> print(files) >output F:\\_python\\dict_class.py F:\\_python\\programmi.txt  next(os.walk('F:\') - get the full path - list comprehension >>> [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f] ['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']  os.walk - get full path - all files in sub dirs x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f] >>>x ['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']  os.listdir() - get only txt files >>> arr_txt = [x for x in os.listdir() if x.endswith(".txt")] >>> print(arr_txt) ['work.txt', '3ebooks.txt']  glob - get only txt files >>> import glob >>> x = glob.glob("*.txt") >>> x ['ale.txt', 'alunni2015.txt', 'assenze.text.txt', 'text2.txt', 'untitled.txt']  Using glob to get the full path of the files If I should need the absolute path of the files: >>> from path import path >>> from glob import glob >>> x = [path(f).abspath() for f in glob("F:\*.txt")] >>> for f in x: ... print(f) ... F:\acquistionline.txt F:\acquisti_2018.txt F:\bootstrap_jquery_ecc.txt  Other use of glob If I want all the files in the directory: >>> x = glob.glob("*")  Using os.path.isfile to avoid directories in the list* import os.path listOfFiles = [f for f in os.listdir() if os.path.isfile(f)] print(listOfFiles) > output ['a simple game.py', 'data.txt', 'decorator.py']  Using pathlib from (Python 3.4) import pathlib >>> flist = [] >>> for p in pathlib.Path('.').iterdir(): ... if p.is_file(): ... print(p) ... flist.append(p) ... error.PNG exemaker.bat guiprova.mp3 setup.py speak_gui2.py thumb.PNG  If you want to use list comprehension >>> flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]  Get all and only files with os.walk import os x = [i[2] for i in os.walk('.')] y=[] for t in x: for f in t: y.append(f) >>> y ['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']  Get only files with next and walk in a directory >>> import os >>> x = next(os.walk('F://python'))[2] >>> x ['calculator.bat','calculator.py']  Get only directories with next and walk in a directory >>> import os >>> next(os.walk('F://python'))[1] # for the current dir use ('.') ['python3','others']  **Get all the subdir names with walk >>> for r,d,f in os.walk("F:\_python"): ... for dirs in d: ... print(dirs) ... .vscode pyexcel pyschool.py subtitles _metaprogramming .ipynb_checkpoints  os.scandir() from python 3.5 on >>> import os >>> x = [f.name for f in os.scandir() if f.is_file()] >>> x ['calculator.bat','calculator.py'] # Another example with scandir (a little variation from docs.python.org) # This one is more efficient than os.listdir. # In this case, it shows the files only in the current directory # where the script is executed. >>> import os >>> with os.scandir() as i: ... for entry in i: ... if entry.is_file(): ... print(entry.name) ... ebookmaker.py error.PNG exemaker.bat guiprova.mp3 setup.py speakgui4.py speak_gui2.py speak_gui3.py thumb.PNG >>>  ## Ex. 1: How many files are there in the subdirectories? In this example, we look for the number of files that are included in all the directory and its subdirectories. import os def count(dir, counter=0): "returns number of files in dir and subdirs" for pack in os.walk(dir): for f in pack[2]: counter += 1 return dir + " : " + str(counter) + "files" print(count("F:\\python")) > output >'F:\\\python' : 12057 files'  ## Ex.2: How to copy all files from a dir to another? A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder. import os import shutil from path import path destination = "F:\\file_copied" # os.makedirs(destination) def copyfile(dir, filetype='pptx', counter=0): "Searches for pptx (or other - pptx is the default) files and copies them" for pack in os.walk(dir): for f in pack[2]: if f.endswith(filetype): fullpath = pack[0] + "\\" + f print(fullpath) shutil.copy(fullpath, destination) counter += 1 if counter > 0: print("------------------------") print("\t==> Found in: " + dir + " : " + str(counter) + " files\n") for dir in os.listdir(): "searches for folders that starts with _" if dir[0] == '_': # copyfile(dir, filetype='pdf') copyfile(dir, filetype='txt') > Output _compiti18\Compito Contabilità 1\conti.txt _compiti18\Compito Contabilità 1\modula4.txt _compiti18\Compito Contabilità 1\moduloa4.txt ------------------------ ==> Found in: _compiti18 : 3 files  kenorb 23# kenorb Reply to 2017-05-26 12:39:03Z  Here is a simple example: import os root, dirs, files = next(os.walk('.')) for file in files: print(file) # In Python 3 use: file.encode('utf-8') in case of error.  Note: Change . to your path value or variable. Here is the example returning list of files with absolute paths: import os path = '.' # Change this as you need. abspaths = [] for fn in os.listdir(path): abspaths.append(os.path.abspath(os.path.join(path, fn))) print("\n".join(abspaths))  Documentation: os and os.path for Python 2, os and os.path for Python 3. Pang 24# Pang Reply to 2017-07-06 02:49:59Z  ls -a  This will list even the hidden stuff. Ashiq Imran 25# Ashiq Imran Reply to 2017-10-18 06:46:14Z  import os os.listdir(path)  This will return list all files and directories in path filenames = next(os.walk(path))[2]  This will return only list of files not subdirectories Joseph K. 26# Joseph K. Reply to 2017-10-22 04:08:16Z  Referring to the answer by @adamk, here is my os detection method in response to the slash inconsistency comment by @Anti Earth import sys import os from pathlib import Path from glob import glob platformtype = sys.platform if platformtype == 'win32': slash = "\\" if platformtype == 'darwin': slash = "/" # TODO: How can I list all files of a directory in Python and add them to a list? # Step 1 - List all files of a directory # Method 1: Find only pre-defined filetypes (.txt) and no subfiles, answer provided by @adamk dir1 = "%sfoo%sbar%s*.txt" % (slash) _files = glob(dir1) # Method 2: Find all files and no subfiles dir2 = "%sfoo%sbar%s" % (slash) _files = (x for x in Path("dir2").iterdir() if x.is_file()) # Method 3: Find all files and all subfiles dir3 = "%sfoo%sbar" % (slash) _files = (x for x in Path('dir3').glob('**/*') if x.is_file()) # Step 2 - Add them to a list files_list = [] for eachfiles in _files: files_basename = os.path.basename(eachfiles) files_list.append(files_basename)  print(files_list) ['file1.txt', 'file2.txt', .... ]  I'm assuming that you want just the basenames in the list. Refer to this post for pre-defining multiple file formats for Method 1. MarredCheese 27# MarredCheese Reply to 2017-12-07 20:10:58Z  Here's my general-purpose function for this. It returns a list of file paths rather than filenames since I found that to be more useful. It has a few optional arguments that make it versatile. For instance, I often use it with arguments like pattern='*.txt' or subfolders=True. import os import fnmatch def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False): """Return a list of the file paths matching the pattern in the specified folder, optionally including files inside subfolders. """ match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch walked = os.walk(folder) if subfolders else [next(os.walk(folder))] return [os.path.join(root, f) for root, dirnames, filenames in walked for f in filenames if match(f, pattern)]  Vinodh Krishnaraju 28# Vinodh Krishnaraju Reply to 2017-12-12 05:30:53Z  I will provide a sample one liner where sourcepath and file type can be provided as input. The code returns a list of filenames with csv extension. Use . in case all files needs to be returned. This will also recursively scans the subdirectories. [y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))] Modify file extensions and source path as needed. CristiFati 29# CristiFati Reply to 2018-02-18 13:58:23Z # Part One 2018 / 02 / 18: Trying to assemble a comprehensive answer... ## Preliminary notes • Although there's a clear differentiation between file and directory terms in the question text, some may argue that directories are actually special files • The statement: "all files of a directory" can be interpreted in 2 ways: 1. All direct (or level 1) descendants only 2. All descendants in the whole directory tree (including the ones in sub-directories) • When the question was asked, I imagine thet Python 2, was the LTS version, however the code samples will be run by Python 3(.5) (I'll keep them as Python2 compliant as possible; also, any code belonging to Python that I'm going to post, is from v3.5.4 - unless otherwise specified). That has consequences related to another keyword in the question: "add them into a list": • In pre Python2.2 versions, sequences (iterables) were mostly represented by lists (tuples, sets, ...) • In Python2.2, the concept of generator ([Python]: Generators) - courtesy of [Python]: The yield statement) - was introduced. As time passed, generator counterparts started to appear for functions that returned/worked with lists • In Python3, generator is the default behavior • Now, I don't know if returning a list is still mandatory (or a generator would do as well), but passing a generator to the list constructor, will create a list out of it (and also consume it). The example below illustrates the differences on [Python]: map(function, iterable, ...) Python 2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> m = map(lambda x: x, [1, 2, 3]) # Just a dummy lambda func >>> m, type(m) ([1, 2, 3], <type 'list'>) >>> len(m) 3  Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> m = map(lambda x: x, [1, 2, 3]) >>> m, type(m) (<map object at 0x000001B4257342B0>, <class 'map'>) >>> len(m) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object of type 'map' has no len() >>> lm0 = list(m) # Construct a list out of the generator >>> lm0, type(lm0) ([1, 2, 3], <class 'list'>) >>> >>> lm1 = list(m) # Construct a list out of the same generator >>> lm1, type(lm1) # Empty list this time - generator already consumed ([], <class 'list'>)  • The examples will be based on a directory called root_dir with the following structure (this example is for Win, but I have duplicated the folder tree for Ux(Lnx) as well): E:\Work\Dev\StackOverflow\q003207219>tree /f "root_dir" Folder PATH listing for volume Work Volume serial number is 00000029 3655:6FED E:\WORK\DEV\STACKOVERFLOW\Q003207219\ROOT_DIR │ file0 │ file1 │ ├───dir0 │ ├───dir00 │ │ │ file000 │ │ │ │ │ └───dir000 │ │ file0000 │ │ │ ├───dir01 │ │ file010 │ │ file011 │ │ │ └───dir02 │ └───dir020 │ └───dir0200 ├───dir1 │ file10 │ file11 │ file12 │ ├───dir2 │ │ file20 │ │ │ └───dir20 │ file200 │ └───dir3  ## Solutions ### Programmatic approaches: 1. [Python]: os.listdir(path='.') Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' ... >>> import os >>> root_dir = "root_dir" # Path relative to current dir (os.getcwd()) >>> >>> os.listdir(root_dir) # List all the items in root_dir ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))] # Filter the items and only keep files (strip out directories) ['file0', 'file1']  Here's a more elaborate example (code_os_listdir.py): import os from pprint import pformat def _get_dir_content(path, include_folders, recursive): entries = os.listdir(path) for entry in entries: entry_with_path = os.path.join(path, entry) if os.path.isdir(entry_with_path): if include_folders: yield entry_with_path if recursive: for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive): yield sub_entry else: yield entry_with_path def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True): path_len = len(path) + len(os.path.sep) for item in _get_dir_content(path, include_folders, recursive): yield item if prepend_folder_name else item[path_len:] def _get_dir_content_old(path, include_folders, recursive): entries = os.listdir(path) ret = list() for entry in entries: entry_with_path = os.path.join(path, entry) if os.path.isdir(entry_with_path): if include_folders: ret.append(entry_with_path) if recursive: ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive)) else: ret.append(entry_with_path) return ret def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True): path_len = len(path) + len(os.path.sep) return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)] def main(): root_dir = "root_dir" ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True) lret0 = list(ret0) print(ret0, len(lret0), pformat(lret0)) ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False) print(len(ret1), pformat(ret1)) if __name__ == "__main__": main()  Notes: • There are 2 implementations: • One that uses generators (of course in this example it seems useless, since I convert the result to a list immediately) • The classic one (function names ending in _old) • Recursion is used (to get into subdirs) • For each implementations there are 2 functions: • One that starts with an underscore (_): "private" (should not be called directly) - that does all the work • The public one (wrapper over previous): it just strips off the initial path (if required) from the returned entries. It's an ugly implementation, but it's the only idea that I could come with at this point • In terms of performance, generators are generally a little bit faster (considering both creation and iteration times), but I didn't test them in recursive functions, and also I am iterating inside the function over inner generators - don't know how performance friendly is that • Play with the arguments to get different results Output: (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" "code_os_listdir.py" <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\\dir0', 'root_dir\\dir0\\dir00', 'root_dir\\dir0\\dir00\\dir000', 'root_dir\\dir0\\dir00\\dir000\\file0000', 'root_dir\\dir0\\dir00\\file000', 'root_dir\\dir0\\dir01', 'root_dir\\dir0\\dir01\\file010', 'root_dir\\dir0\\dir01\\file011', 'root_dir\\dir0\\dir02', 'root_dir\\dir0\\dir02\\dir020', 'root_dir\\dir0\\dir02\\dir020\\dir0200', 'root_dir\\dir1', 'root_dir\\dir1\\file10', 'root_dir\\dir1\\file11', 'root_dir\\dir1\\file12', 'root_dir\\dir2', 'root_dir\\dir2\\dir20', 'root_dir\\dir2\\dir20\\file200', 'root_dir\\dir2\\file20', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1'] 11 ['dir0\\dir00\\dir000\\file0000', 'dir0\\dir00\\file000', 'dir0\\dir01\\file010', 'dir0\\dir01\\file011', 'dir1\\file10', 'dir1\\file11', 'dir1\\file12', 'dir2\\dir20\\file200', 'dir2\\file20', 'file0', 'file1']  1. [Python]: os.scandir(path='.') (!!! Python 3.5+ !!! although I think that for earlier versions it was a separate module (also ported to Python2)) Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries '.' and '..' are not included. Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows. >>> import os >>> root_dir = os.path.join(".", "root_dir") # Explicitly prepending current directory >>> root_dir '.\\root_dir' >>> >>> scandir_iterator = os.scandir(root_dir) >>> scandir_iterator <nt.ScandirIterator object at 0x00000268CF4BC140> >>> [item.path for item in scandir_iterator] ['.\\root_dir\\dir0', '.\\root_dir\\dir1', '.\\root_dir\\dir2', '.\\root_dir\\dir3', '.\\root_dir\\file0', '.\\root_dir\\file1'] >>> >>> [item.path for item in scandir_iterator] # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension) [] >>> >>> scandir_iterator = os.scandir(root_dir) # Reinitialize the generator >>> for item in scandir_iterator : ... if os.path.isfile(item.path): ... print(item.name) ... file0 file1  Notes: • It's similar to os.listdir • But it's also more flexible (and offers more functionality), more Pythonic (and in some cases, faster) 1. [Python]: os.walk(top, topdown=True, onerror=None, followlinks=False) Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames). >>> import os >>> root_dir = os.path.join(os.getcwd(), "root_dir") # Specify the full path >>> root_dir 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir' >>> >>> walk_generator = os.walk(root_dir) >>> root_dir_entry = next(walk_generator) # First entry corresponds to the root dir (that was passed as an argument) >>> root_dir_entry ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1']) >>> >>> root_dir_entry[1] + root_dir_entry[2] # Display the dirs and the files (that are direct descendants) in a single list ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]] # Display all the entries in the previous list by their full path ['E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file1'] >>> >>> for entry in walk_generator: # Display the rest of the elements (corresponding to every subdir) ... print(entry) ... ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', ['dir00', 'dir01', 'dir02'], []) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00', ['dir000'], ['file000']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00\\dir000', [], ['file0000']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir01', [], ['file010', 'file011']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02', ['dir020'], []) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020', ['dir0200'], []) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020\\dir0200', [], []) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', [], ['file10', 'file11', 'file12']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', ['dir20'], ['file20']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2\\dir20', [], ['file200']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', [], [])  Notes: • Under the scenes, it uses os.listdir (os.scandir where available) • It does the heavy lifting by recurring in subfolders 1. [Python]: glob.glob(pathname, *, recursive=False) ([Python]: glob.iglob(pathname, *, recursive=False)) Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell). ... Changed in version 3.5: Support for recursive globs using “**”. >>> import glob, os >>> wildcard_pattern = "*" >>> root_dir = os.path.join("root_dir", wildcard_pattern) # Match every file/dir name >>> root_dir 'root_dir\\*' >>> >>> glob_list = glob.glob(root_dir) >>> glob_list ['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1'] >>> >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list] # Strip the dir name and the path separator from begining ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> for entry in glob.iglob(root_dir + "*", recursive=True): ... print(entry) ... root_dir\ root_dir\dir0 root_dir\dir0\dir00 root_dir\dir0\dir00\dir000 root_dir\dir0\dir00\dir000\file0000 root_dir\dir0\dir00\file000 root_dir\dir0\dir01 root_dir\dir0\dir01\file010 root_dir\dir0\dir01\file011 root_dir\dir0\dir02 root_dir\dir0\dir02\dir020 root_dir\dir0\dir02\dir020\dir0200 root_dir\dir1 root_dir\dir1\file10 root_dir\dir1\file11 root_dir\dir1\file12 root_dir\dir2 root_dir\dir2\dir20 root_dir\dir2\dir20\file200 root_dir\dir2\file20 root_dir\dir3 root_dir\file0 root_dir\file1  Notes: • Uses os.listdir • For large trees (especially if recursive is on), iglob is preferred • Allows advanced filtering based on name (due to the wildcard) 1. [Python]: class pathlib.Path(*pathsegments) (!!! Python3+ !!! don't know if backported) >>> import pathlib >>> root_dir = "root_dir" >>> root_dir_instance = pathlib.Path(root_dir) >>> root_dir_instance WindowsPath('root_dir') >>> root_dir_instance.name 'root_dir' >>> root_dir_instance.is_dir() True >>> >>> [item.name for item in root_dir_instance.glob("*")] # Wildcard searching for all direct descendants ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()] # Display paths (including parent) for files only ['root_dir\\file0', 'root_dir\\file1']  Notes: • This is one way of achieving our goal • It's the OOP style of handling paths • Offers lots of functionalities 1. [Python]: dircache.listdir(path) (!!! removed in Python3 !!!) • But, according to${PYTHON_SRC_DIR}/Lib/dircache.py: ~#20+ (from v2.7.14), it's just a (thin) wrapper over os.listdir

def listdir(path):
"""List directory contents, using cache."""
try:
cached_mtime, list = cache[path]
del cache[path]
except KeyError:
cached_mtime, list = -1, []
mtime = os.stat(path).st_mtime
if mtime != cached_mtime:
list = os.listdir(path)
list.sort()
cache[path] = mtime, list
return list


1. [man]: OPENDIR(3) / [man]: READDIR(3) / [man]: CLOSEDIR(3) via [Python]: ctypes — A foreign function library for Python (!!! Ux specific !!!)

ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

code_ctypes.py:

#!/usr/bin/env python3

import sys
from ctypes import Structure, \
c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, \
CDLL, POINTER, \
create_string_buffer, get_errno, set_errno, cast, sizeof

DT_DIR = 4
DT_REG = 8

char256 = c_char * 256

class LinuxDirent64(Structure):
_fields_ = [
("d_ino", c_ulonglong),
("d_off", c_longlong),
("d_reclen", c_ushort),
("d_type", c_ubyte),
("d_name", char256),
]

LinuxDirent64Ptr = POINTER(LinuxDirent64)

libc_dll = CDLL(None)
opendir = libc_dll.opendir
readdir = libc_dll.readdir
closedir = libc_dll.closedir
libc_dll.__errno_location.restype = POINTER(c_int)
errno_loc_func = libc_dll.__errno_location

def _get_errno():
return "errno: {:d}({:d})".format(get_errno(), errno_loc_func().contents.value)

def get_dir_content(path):
ret = [path, list(), list()]
dir_stream = opendir(create_string_buffer(path.encode()))
if (dir_stream == 0):
print("opendir returned NULL ({:s})".format(_get_errno()))
return ret
set_errno(0)
dirent_addr = readdir(dir_stream)
while dirent_addr:
dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr)
dirent = dirent_ptr.contents
name = dirent.d_name.decode()
if dirent.d_type & DT_DIR:
if name not in (".", ".."):
ret[1].append(name)
elif dirent.d_type & DT_REG:
ret[2].append(name)
dirent_addr = readdir(dir_stream)
if get_errno() or errno_loc_func().contents.value:
print("readdir returned NULL ({:s})".format(_get_errno()))
closedir(dir_stream)
return ret

def main():
print("{:s} on {:s}\n".format(sys.version, sys.platform))
root_dir = "root_dir"
entries = get_dir_content(root_dir)
print(entries)

if __name__ == "__main__":
main()


Notes:

• It loads the 3 funcs from libc (loaded in the current process) and calls them (for more details check [SO]: How do I check whether a file exists using Python? (CristiFati's answer) - last notes from item #2.). That would place this approach very close to the Python / C edge
• LinuxDirent64 is the ctypes representation of struct dirent64 from dirent.h (so are the DT_* constants) from my machine: Ubtu 16 x64 (4.10.0-40-generic and libc6-dev:amd64). On other flavors/versions, the struct definition might differ, and if so, the ctypes alias should be updated, otherwise it will yield Undefined Behavior
• errno_loc_func (and everything related to it) is because the funcs set errno in case of error, and I need to check its value. Apparently, get_errno doesn't work (with an invalid name, opendir returns NULL, but get_errno still returns 0), or I didn't figure it out yet
• It returns data in the os.walk's format. I didn't bother to make it recursive, but starting from the existing code, that would be a fairly trivial task
• Everything is doable on Win as well, the data (libraries, functions, structs, constants, ...) differ

Output:

cfati@testserver:~/work/stackoverflow/q003207219$./code_ctypes.py 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux ['root_dir', ['dir3', 'dir2', 'dir0', 'dir1'], ['file0', 'file1']]  1. [ActiveState]: win32file.FindFilesW (!!! Win specific !!!) Retrieves a list of matching filenames, using the Windows Unicode API. An interface to the API FindFirstFileW/FindNextFileW/Find close functions. >>> import os, win32file, win32con >>> root_dir = "root_dir" >>> wildcard = "*" >>> root_dir_wildcard = os.path.join(root_dir, wildcard) >>> entry_list = win32file.FindFilesW(root_dir_wildcard) >>> len(entry_list) # Don't display the whole content as it's too long 8 >>> [entry[-2] for entry in entry_list] # Only display the entry names ['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")] # Filter entries and only display dir names (except self and parent) ['dir0', 'dir1', 'dir2', 'dir3'] >>> >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)] # Only display file "full" names ['root_dir\\file0', 'root_dir\\file1']  Notes: • win32file.FindFilesW is part of [SourceForge]: Python for Windows Extensions (pywin32), which is a Python wrapper over WINAPIs • The documentation link is from https://www.activestate.com, as I didn't find any pywin32 official doc 1. Install some (other) 3rdParty package that does the trick • Most likely, will rely on one (or more) of the above (maybe with slight customizations) Notes (about the stuff above): • Code is meant to be portable (except places that target a specific area - which are marked) or cross: • platform (Ux, Win, ) • Python version (2, 3, ) • Multiple path styles (absolute, relatives) were used across the above variants, to illustrate the fact that the "tools" used are flexible in this direction • os.listdir and os.scandir use opendir / readdir / closedir ([MSDN]: FindFirstFile function / [MSDN]: FindNextFile function / [MSDN]: FindClose function) (via "${PYTHON_SRC_DIR}/Modules/posixmodule.c")
• win32file.FindFilesW uses those (Win specific) functions as well (via "\${PYWIN32_SRC_DIR}/win32/src/win32file.i")
• get_dir_content (from point #1.) can be implemented using any of these approaches (some will require more work and some less)
• Some advanced filtering (instead of just file vs. dir) could be done: e.g. the include_folders argument could be replaced by another one (e.g. filter_func) which would be a function that takes a path as an argument: filter_func=lambda x: True (this doesn't strip out anything) and inside get_dir_content something like: if not filter_func(entry_with_path): continue (if the function fails for one entry, it will be skipped), but the more complex the code becomes, the longer it will take to execute
• Nota bene! Since recursion is used, I must mention that I did some tests on my laptop (Win 10 x64), totally unrelated to this problem, and when the recursion level was reaching values somewhere in the (990 .. 1000) range, I got StackOverflow :). If the directory tree exceeds that limit (I am not an FS expert, so I don't know if that is even possible), that could be a problem (I must also mention that I didn't try to increase the stack size at OS level)
• The code samples are for demonstrative purposes only. That means that I didn't take into account error handling (I don't think there's any try / except / else / finally block), so the code is not robust (the reason is: to keep it as simple and short as possible). For production, error handling should be added as well

# End of Part One

Due to the fact that SO's post (question / answer) limit is 30000 chars ([Meta.SE]: Knowing Your Limits: What is the maximum length of a question title, post, image and links used?),
this answer is "To be continued..." at
[SO]: How do I list all files of a directory? (CristiFati's answer - Part Two)

CristiFati
30#
CristiFati Reply to 2018-02-18 13:56:06Z

Due to the fact that SO's post (question / answer) limit is 30000 chars ([Meta.SE]: Knowing Your Limits: What is the maximum length of a question title, post, image and links used?),
this answer is a continuation of
[SO]: How do I list all files of a directory? (CristiFati's answer - Part One)

# Part Two

## Solutions (continued)

### Other approaches:

1. Use Python only as a wrapper

• Everything is done using another technology
• That technology is invoked from Python (and sometimes its output is parsed - which is lame since if some command slightly changes its output format between OS versions, the code should be adapted as well; not to mention non EN locales)
• The most famous flavor that I know is what I call the sysadmin approach:

• Use Python (or any programming language for that matter) in order to execute shell commands
• Some consider this a neat hack
• I consider it more like a lame workaround (gainarie), as the action per se is performed from shell (cmd in this case), and thus doesn't have anything to do with Python
• Filtering (grep / findstr) or output formatting could be done on both sides, but I'm not going to insist on it. Also I deliberately used os.system instead of subprocess.Popen
(py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
dir0
dir1
dir2
dir3
file0
file1


Final note(s):

• I will try to keep it up to date, any suggestions are welcome, I will incorporate anything useful that will come up into the answer(s)
 You need to login account before you can post.
Processed in 0.444623 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO