os.walk()函数的用法

2019年7月1日 / 74次阅读 / Last Modified 2019年8月18日
os模块遍历文件

Python os模块的walk()函数,顾名思义,就是用来遍历目录树的。此函数可以很方便的遍历以输入的路径为root的所有子目录和其中的文件。以前的Python版本,有一个os.path.walk函数,这个函数在最新的版本中已经取消了。(参考:os.path模块的接口

os.walk()函数的接口和返回值

我们先看一下os.walk()函数的docstring帮助信息:

>>> import os
>>> help(os.walk)
Help on function walk in module os:

walk(top, topdown=True, onerror=None, followlinks=False)
    Directory tree generator.
    
    For each directory in the directory tree rooted at top (including top
    itself, but excluding '.' and '..'), yields a 3-tuple
    
        dirpath, dirnames, filenames
    
    dirpath is a string, the path to the directory.  dirnames is a list of
    the names of the subdirectories in dirpath (excluding '.' and '..').
    filenames is a list of the names of the non-directory files in dirpath.
    Note that the names in the lists are just names, with no path components.
    To get a full path (which begins with top) to a file or directory in
    dirpath, do os.path.join(dirpath, name).
    
    If optional arg 'topdown' is true or not specified, the triple for a
    directory is generated before the triples for any of its subdirectories
    (directories are generated top down).  If topdown is false, the triple
    for a directory is generated after the triples for all of its
    subdirectories (directories are generated bottom up).
    
    When topdown is true, the caller can modify the dirnames list in-place
    (e.g., via del or slice assignment), and walk will only recurse into the
    subdirectories whose names remain in dirnames; this can be used to prune the
    search, or to impose a specific order of visiting.  Modifying dirnames when
    topdown is false is ineffective, since the directories in dirnames have
    already been generated by the time dirnames itself is generated. No matter
    the value of topdown, the list of subdirectories is retrieved before the
    tuples for the directory and its subdirectories are generated.
    
    By default errors from the os.scandir() call are ignored.  If
    optional arg 'onerror' is specified, it should be a function; it
    will be called with one argument, an OSError instance.  It can
    report the error to continue with the walk, or raise the exception
    to abort the walk.  Note that the filename is available as the
    filename attribute of the exception object.
    
    By default, os.walk does not follow symbolic links to subdirectories on
    systems that support them.  In order to get this functionality, set the
    optional argument 'followlinks' to true.

    Caution:  if you pass a relative pathname for top, don't change the
    current working directory between resumptions of walk.  walk never
    changes the current directory, and assumes that the client doesn't
    either.

下面解释这4个参数的含义:

top:使用walk函数必须要输入的顶层路径,walk函数遍历的内容,均在这个路径下面。最好使用绝对路径,如果使用相对路径也可以,只是要确保在walk期间,不要改变python解释器的工作路径(os.getcwd()函数的返回值)。

topdown:这个参数默认是True,表示遍历从顶层开始,然后层层往下。如果这个参数为False,表示遍历从底层开始,然后层层网上。

onerror:默认情况下,walk函数会忽略掉遍历过程中的错误(来自walk函数调用的os.scandir()函数),如果要处理错误,onerror要指向一个函数。这个参数一般情况下用不着修改。

followlinks:默认情况下,walk函数会忽略符号连接文件,符号连接文件,不会出现在返回值中,如果这个参数为True,则符号连接文件会出现在哎返回值中。

os.walk()函数的返回值:

walk函数是一个Python生成器(generator),调用方式是在一个for...in...循环中,walk生成器每次返回的是一个含有3个元素的tuple,分别是 (dirpath, dirnames, filenames)

dirpath:这个值是一个string,表示walk函数这一次的返回数据,对应的路径,即dirnames和filenames都属于这个dirpath。

dirnames:这个值是一个list,list中的每个值,都是dirpath路径下一个子目录。

filenames:这个值是一个list,list中的每个值,都是dirpath路径下的一个文件。

代码示例

我们用这颗目录树来测试os.walk()函数的运行情况。

$ pwd
/home/xinlin/test
$ tree
.
├── a.txt
├── b.txt
├── c.txt -> a.txt
├── f1
├── f2
├── f3
├── f4
├── test2
│   ├── dir_in_test2
│   └── in_test2.txt
└── test3
    └── in_test3.txt

3 directories, 9 files

以上使用的是tree命令显示目录树。

遍历目录树的所有路径和文件:

>>> for dirpath,dirnames,filenames in os.walk('/home/xinlin/test'):
...     print(dirpath)
...     print('filenames',filenames)
...     print('dirnames',dirnames)
... 
/home/xinlin/test
filenames ['c.txt', 'f4', 'f1', 'f2', 'f3', 'a.txt', 'b.txt']
dirnames ['test2', 'test3']
/home/xinlin/test/test2
filenames ['in_test2.txt']
dirnames ['dir_in_test2']
/home/xinlin/test/test2/dir_in_test2
filenames []
dirnames []
/home/xinlin/test/test3
filenames ['in_test3.txt']
dirnames []

观察这个for..in...的打印信息,代码从顶层路径开始,先打印dirpath,再打印dirpath下的所有文件(不含符号连接文件),最后打印dirpath下的所有dirnames。如果某个路径下,没有子路径,dirnames为空list,如果没有文件,filenames也为空list。

从底层开始遍历目录树,包含符号连接文件:

>>> for dirpath,dirnames,filenames in os.walk('/home/xinlin/test',
                                False,followlinks=True):
...     print(dirpath)
...     print('filenames',filenames)
...     print('dirnames',dirnames)
... 
/home/xinlin/test/test2/dir_in_test2
filenames []
dirnames []
/home/xinlin/test/test2
filenames ['in_test2.txt']
dirnames ['dir_in_test2']
/home/xinlin/test/test3
filenames ['in_test3.txt']
dirnames []
/home/xinlin/test
filenames ['c.txt', 'f4', 'f1', 'f2', 'f3', 'a.txt', 'b.txt']
dirnames ['test2', 'test3']

注意倒数第2行里卖弄的c.txt,这是符号连接文件;倒数第3行打印,顶层路径的遍历再最后执行。

循环路径

os.walk()函数可以支持followlinks=True,如果此link是一个指向自己的路径,或者是包含再top顶层路径下的一个路径,这样就形成了循环路径的问题。我们来看看walk函数面对循环路径的表现:

循环路径测试用的目录树如下:

$ pwd
/home/xinlin/test_walk
$ tree
.
├── ad -> /home/xinlin/test_walk
└── file.txt

1 directory, 1 file

路径/home/xinlin/test_walk下包含一个普通文件,一个符号连接文件,此link指向自己所在路径。

os.walk()函数在循环路径下的表现:

>>> for dirpath,dirnames,filenames in os.walk('/home/xinlin/test_walk',
                                          followlinks=True):
...     print(dirpath)
...     print('filenames',filenames)
...     print('dirnames',dirnames)
... 
/home/xinlin/test_walk
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt']
dirnames ['ad']
/home/xinlin/test_walk/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad/ad
filenames ['file.txt', 'ad']
dirnames []

嗯~~~~,貌似还不算特别糟糕,至少没有陷入死循环!

以上测试代码,造成运行停止的原因,应该是符号连接文件follow的次数限制造成的,这个特性应该是Linux系统提供的。另外一个问题是,如果followlinks=False,ad这个文件,以及前面代码中的c.txt文件,都将不会被遍历到。

walktree.py

如果你的需求是,要遍历到所有的links文件,但是又不要follow它们,os.walk()函数看起来就不能满足你的需要了。本站提供了一段代码,可以实现这个需求,代码请参考:递归遍历目录树。下面是测试,仅仅只是打印所有的文件:

$ python3 walktree.py /home/xinlin/test_walk
get to /home/xinlin/test_walk/file.txt
get to /home/xinlin/test_walk/ad

下面是测试/home/xinlin/test路径的情况:

$ python3 walktree.py /home/xinlin/test
get to /home/xinlin/test/c.txt
get to /home/xinlin/test/test2/in_test2.txt
get to /home/xinlin/test/f4
get to /home/xinlin/test/f1
get to /home/xinlin/test/f2
get to /home/xinlin/test/f3
get to /home/xinlin/test/a.txt
get to /home/xinlin/test/test3/in_test3.txt
get to /home/xinlin/test/b.txt

本站提供的这段代码,没有处理dir,你可以自己根据自己的需要修改。

以上就是对os.walk()函数的介绍。

-- EOF --

本文链接:https://www.pynote.net/archives/419

留言区

电子邮件地址不会被公开。 必填项已用*标注


前一篇:
后一篇:

More

麦新杰的Python笔记

Ctrl+D 收藏本页


©Copyright 麦新杰 Since 2019 Python笔记

go to top