对于python来说,最显而易见的文本处理工具就所string类,不过除此之外,标准库中还提供了大量的其他工具,可以帮大家轻松的完成高级文本处理。
最近在看《python标准库》一书,所以下面就介绍集中常见的方法:
string——文本常量
目前string模块中还有两个函数未移除:capwords()和maketrans(). capwords()的作用是将一个字符串中所有的单词的首字母大写。例如:
1 import string2 3 s = "hello! this is test...."4 5 print s6 print string.capwords(s)
其运行结果为:
hello! this is test...Hello! This Is Test...
maketrans()函数将创建转换表,可以用来结合translate()方法将一组字符修改为另一组字符,这种做法比反复调用replace()更为高效。例如:
1 import string2 3 pass_code = string.maketrans("abcdefghi","123456789")4 s = "the quick brown fox jumped over the lazy dog."5 6 print s7 print s.translate(pass_code)
其运行结果为:
the quick brown fox jumped over the lazy dog.
t85 qu93k 2rown 6ox jump54 ov5r t85 l1zy 4o7.
在这个例子中,一些字母被替换为相应的“火星文”数字(事先准备好的对应字母与数字:"abcdefghi","123456789")备注: translate()是字符的一一映射. 每个字符只要出现都会被替换为对应的字符.
replace()是字符串替换, 字符串完整出现后被整体替换.replace的两个字符串参数长度可以不同.textwrap——格式化文本段落
以下例子中用到的范本:
sample_text = """the textwrap module can be used to format text for otput in situations where pretty-printing s dsired, it offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors"""
示例代码为:
import textwrapsample_text = """the textwrap module can be used to format text for otput in situations where pretty-printing s dsired, it offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors""" print "Test:\n"print textwrap.fill(sample_text,width=50)
输出内容为:
Test:
the textwrap module can be used to format text
for otput in situations where pretty-printing s dsired, it offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editorsfill()函数取文本作为输入,生成格式化的文本作为输出。
虽然文本最后的结果为左对齐,不过只有第一行保留了缩进,其余各行前面的空格则嵌入到段落中。
下面修改一下代码即可(dedent()函数执行去除缩进):
print textwrap.dedent(sample_text)
输出内容为:
Test:
the textwrap module can be used to format text
for otput in situations where pretty-printing s dsired, it offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors除以上方法外还可以结合dedent和fill使用,可以去除缩进的文本传入fill(),并提供一组不同的width值。按照指定宽带结合dedent去除缩进现实。另外还可以进行悬挂缩进处理。示例代码分别为:
import textwrapsample_text = """the textwrap module can be used to format text for otput in situations where pretty-printing s dsired, it offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors""" dedent_text = textwrap.dedent(sample_text).strip()for width in [45,70]: print "%d columns:\n" % width print textwrap.fill(dedented_text,width=width) print
import textwrapsample_text = """the textwrap module can be used to format text for otput in situations where pretty-printing s dsired, it offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors""" dedent_text = textwrap.dedent(sample_text).strip()print textwrap.fill(dedent_text, initial_indent=" ", subsequent_indent= " " *4, width=50, )
备注:strip()去除字符串开始及结束的空格符号等。