正则表达式
Swift 的正则表达式库
介绍
本库使用NSRegularExpression执行正则表达式模式匹配的实际逻辑。然而,它提供了一个更为简洁的接口,并且专门设计来最大程度地利用Swift。支持的正则表达式语法可以在此处找到。您也应该查看NSRegularExpression.Options和NSRegularExpression.MatchingOptions。我建议使用https://regex101.com/来测试您的正则表达式模式。
安装
- 在Xcode中,打开您想添加此包的项目。
- 从菜单栏,选择文件 > Swift包 > 添加包依赖...
- 将url的地址粘贴到搜索字段中。
- 按照提示添加包。
使用正则表达式对象
Strin.regexMatch
、String.regexFindAll
、String.regexSub
和String.regexSplit
都接受一个符合RegexProtocol
的对象。该对象包含有关正则表达式的信息,包括
var pattern: String { get }
- 正则表达式模式。var regexOptions: NSRegularExpression.Options { get }
- 正则表达式选项(见NSRegularExpression.Options)。var matchingOptions: NSRegularExpression.MatchingOptions { get }
- 查看NSRegularExpression.MatchingOptions。var groupNames: [String]? { get }
- 捕获组的名称。
RegexProtocol
还定义了多个便利方法
func asNSRegex() throws -> NSRegularExpression
将self转换为NSRegularExpression。
func numberOfCaptureGroups() throws -> Int
返回正则表达式中的捕获组数。
func patternIsValid() -> Bool
如果正则表达式模式有效,返回true。否则为false。
init(
pattern: String,
regexOptions: NSRegularExpression.Options = [],
matchingOptions: NSRegularExpression.MatchingOptions = [],
groupNames: [String]? = nil
) throws {
init(
_ pattern: String,
_ regexOptions: NSRegularExpression.Options = []
) throws {
init(
nsRegularExpression: NSRegularExpression,
matchingOptions: NSRegularExpression.MatchingOptions = [],
groupNames: [String]? = nil
) {
func regexMatch<RegularExpression: RegexProtocol>(
_ regex: RegularExpression,
range: Range<String.Index>? = nil
) throws -> RegexMatch? {
func regexMatch(
_ pattern: String,
regexOptions: NSRegularExpression.Options = [],
matchingOptions: NSRegularExpression.MatchingOptions = [],
groupNames: [String]? = nil,
range: Range<String.Index>? = nil
) throws -> RegexMatch? {
range
表示在字符串中搜索模式所在的字符串范围。
这些方法如果在模式无效或者组名数量与捕获组数量不匹配时将抛出异常(参见 RegexError)。如果没有找到匹配项,它们永远不会抛出错误。
有关这些函数返回的 RegexMatch
的信息,请参阅 提取匹配项和捕获组。
警告:如果修改了源字符串,匹配和捕获组的范围可能会失效。使用 String.regexsub 执行多个替换。
示例
var inputText = "name: Chris Lattner"
// create the regular expression object
let regex = try Regex(
pattern: "name: ([a-z]+) ([a-z]+)",
regexOptions: [.caseInsensitive]
)
if let match = try inputText.regexMatch(regex) {
print("full match: '\(match.fullMatch)'")
print("first capture group: '\(match.groups[0]!.match)'")
print("second capture group: '\(match.groups[1]!.match)'")
// perform a replacement on the first capture group
inputText.replaceSubrange(
match.groups[0]!.range, with: "Steven"
)
print("after replacing text: '\(inputText)'")
}
// full match: 'name: Chris Lattner'
// first capture group: 'Chris'
// second capture group: 'Lattner'
// after replacing text: 'name: Steven Lattner'
let inputText = """
Man selects only for his own good: \
Nature only for that of the being which she tends.
"""
let pattern = #"Man selects ONLY FOR HIS OWN (\w+)"#
let searchRange =
(inputText.startIndex)
..<
(inputText.index(inputText.startIndex, offsetBy: 40))
let match = try inputText.regexMatch(
pattern,
regexOptions: [.caseInsensitive],
matchingOptions: [.anchored], // anchor matches to the beginning of the string
groupNames: ["word"], // the names of the capture groups
range: searchRange // the range of the string in which to search for the pattern
)
if let match = match {
print("full match:", match.fullMatch)
print("capture group:", match.group(named: "word")!.match)
}
// full match: Man selects only for his own good
// capture group: good
查找正则表达式的所有匹配项
String.regexFindAll
将返回字符串中由正则表达式匹配到的所有匹配项,如果没有找到匹配项,则返回空数组。它的重载与 String.regexMatch
完全相同
func regexFindAll<RegularExpression: RegexProtocol>(
_ regex: RegularExpression,
range: Range<String.Index>? = nil
) throws -> [RegexMatch] {
func regexFindAll(
_ pattern: String,
regexOptions: NSRegularExpression.Options = [],
matchingOptions: NSRegularExpression.MatchingOptions = [],
groupNames: [String]? = nil,
range: Range<String.Index>? = nil
) throws -> [RegexMatch] {
警告:如果修改了源字符串,匹配和捕获组的范围可能会失效。使用 String.regexsub 执行多个替换。
与 String.regexMatch
一样,range
表示搜索模式的字符串范围。
这些方法如果在模式无效或者组名数量与捕获组数量不匹配时将抛出异常(参见 RegexError)。如果没有找到匹配项,它们永远不会抛出错误。
有关这些函数返回的 RegexMatch
的信息,请参阅 提取匹配项和捕获组。
示例
var inputText = "season 8, EPISODE 5; season 5, episode 20"
// create the regular expression object
let regex = try Regex(
pattern: #"season (\d+), Episode (\d+)"#,
regexOptions: [.caseInsensitive],
groupNames: ["season number", "episode number"]
// the names of the capture groups
)
let results = try inputText.regexFindAll(regex)
for result in results {
print("fullMatch: '\(result.fullMatch)'")
print("capture groups:")
for captureGroup in result.groups {
print(" \(captureGroup!.name!): '\(captureGroup!.match)'")
}
print()
}
let firstResult = results[0]
// perform a replacement on the first full match
inputText.replaceSubrange(
firstResult.range, with: "new value"
)
print("after replacing text: '\(inputText)'")
// fullMatch: 'season 8, EPISODE 5'
// capture groups:
// 'season number': '8'
// 'episode number': '5'
//
// fullMatch: 'season 5, episode 20'
// capture groups:
// 'season number': '5'
// 'episode number': '20'
//
// after replacing text: 'new value; season 5, episode 20'
按模式出现的次数分割字符串
String.regexSplit
将根据模式出现的次数分割字符串。
func regexSplit(
_ pattern: String,
regexOptions: NSRegularExpression.Options = [],
matchingOptions: NSRegularExpression.MatchingOptions = [],
ignoreIfEmpty: Bool = false,
maxLength: Int? = nil,
range: Range<String.Index>? = nil
) throws -> [String] {
func regexSplit<RegularExpression: RegexProtocol>(
_ regex: RegularExpression,
ignoreIfEmpty: Bool = false,
maxLength: Int? = nil,
range: Range<String.Index>? = nil
) throws -> [String] {
ignoreIfEmpty
- 如果为 true,则将从数组中删除所有空字符串。如果为 false(默认),则将包含它们。maxLength
- 返回数组的最大长度。如果为 nil(默认),则字符串会根据模式出现的每个位置进行切割。- 返回 包含按照模式出现的每次切割后得到的字符串的数组。如果没有找到模式的出现,则返回一个包含整个字符串的单元素数组。
示例
let colors = "red,orange,yellow,blue"
let array = try colors.regexSplit(",")
// array = ["red", "orange", "yellow", "blue"]
let colors = "red and orange ANDyellow and blue"
// create the regular expression object
let regex = try Regex(#"\s*and\s*"#, [.caseInsensitive])
let array = try colors.regexSplit(regex, maxLength: 3)
// array = ["red", "orange", "yellow"]
// note that "blue" is not returned because the length of the
// array was limited to 3 items.
执行正则表达式替换
String.regexSub
和 String.regexSubInPlace
将执行正则表达式替换。它们具有完全相同的参数和重载。
func regexSub(
_ pattern: String,
with template: String = "",
regexOptions: NSRegularExpression.Options = [],
matchingOptions: NSRegularExpression.MatchingOptions = [],
range: Range<String.Index>? = nil
) throws -> String {
func regexSub<RegularExpression: RegexProtocol>(
_ regex: RegularExpression,
with template: String = "",
range: Range<String.Index>? = nil
) throws -> String {
with
- 用于替换匹配模式的模板字符串。有关模板的格式化方式,请参阅 模板匹配格式。默认为空字符串。- 返回 替换完成后的新字符串。如果没有找到匹配项,则返回未更改的字符串。
示例
let name = "Peter Schorn"
// The .anchored matching option only looks for matches
// at the beginning of the string.
// Consequently, only the first word will be matched.
let regexObject = try Regex(
pattern: #"\w+"#,
regexOptions: [.caseInsensitive],
matchingOptions: [.anchored]
)
let replacedText = try name.regexSub(regexObject, with: "word")
// replacedText = "word Schorn"
let name = "Charles Darwin"
let reversedName = try name.regexSub(
#"(\w+) (\w+)"#,
with: "$2 $1"
// $1 and $2 represent the
// first and second capture group, respectively.
// $0 represents the entire match.
)
// reversedName = "Darwin Charles"
自定义闭包进行正则表达式替换
如果你需要进一步自定义正则表达式替换,可以使用以下方法
func regexSub<RegularExpression: RegexProtocol>(
_ regex: RegularExpression,
range: Range<String.Index>? = nil,
replacer: (_ matchIndex: Int, _ match: RegexMatch) -> String?
) throws -> String {
func regexSub(
_ pattern: String,
regexOptions: NSRegularExpression.Options = [],
matchingOptions: NSRegularExpression.MatchingOptions = [],
groupNames: [String]? = nil,
range: Range<String.Index>? = nil,
replacer: (_ matchIndex: Int, _ match: RegexMatch) -> String?
) throws -> String {
replacer
- 一个闭包,它接受正则表达式匹配的索引和一个正则表达式匹配项,并返回一个新字符串来替换它。在闭包中返回 nil 以表示不应更改匹配项。
示例
let inputString = """
Darwin's theory of evolution is the \
unifying theory of the life sciences.
"""
let pattern = #"\w+"# // match each word in the input string
let replacedString = try inputString.regexSub(pattern) { indx, match in
if indx > 5 { return nil } // only replace the first 5 matches
return match.fullMatch.uppercased() // uppercase the full match
}
// replacedString = """
// DARWIN'S THEORY OF EVOLUTION IS the \
// unifying theory of the life sciences.
// """
如果你需要对每个单独的捕获组执行替换,可以使用 RegexMatch
结构体的 replaceGroups
方法。
func replaceGroups(
_ replacer: (
_ groupIndex: Int, _ group: RegexGroup
) -> String?
) -> String {
replacer
- 一个闭包,它接受捕获组的索引和捕获组,并返回一个新字符串来替换它。在闭包中返回 nil 以表示不应更改捕获组。
示例
let inputText = "name: Peter, id: 35, job: programmer"
let pattern = #"name: (\w+), id: (\d+)"#
let groupNames = ["name", "id"]
let match = try inputText.regexMatch(
pattern, groupNames: groupNames
)!
let replacedMatch = match.replaceGroups { indx, group in
if group.name == "name" { return "Steven" }
if group.name == "id" { return "55" }
return nil
}
// match.fullMatch = "name: Peter, id: 35"
// replacedMatch = "name: Steven, id: 55"
你可以以下方式组合上述方法
let inputString = """
name: sally, id: 26
name: alexander, id: 54
"""
let regexObject = try Regex(
pattern: #"name: (\w+), id: (\d+)"#,
groupNames: ["name", "id"]
)
let replacedText = try inputString.regexSub(regexObject) { indx, match in
if indx == 0 { return nil }
return match.replaceGroups { indx, group in
if group.name == "name" {
return group.match.uppercased()
}
if group.name == "id" {
return "redacted"
}
return nil
}
}
// replacedText = """
// name: sally, id: 26
// name: ALEXANDER, id: redacted
// """
在 switch 语句中检查正则表达式匹配
模式匹配运算符 ~=
已被重载以支持在 switch 语句中检查正则表达式的匹配。例如
let inputStrig = #"user_id: "asjhjcb""#
switch inputStrig {
case try Regex(#"USER_ID: "[a-z]+""#, [.caseInsensitive]):
print("valid user id")
case try? Regex(#"[!@#$%^&]+"#):
print("invalid character in user id")
case try! Regex(#"\d+"#):
print("user id cannot contain numbers")
default:
print("no match")
}
// prints "valid user id"
try
、try?
和 try!
都可以用于确定处理无效正则表达式模式错误的最佳方法。不幸的是,无法将正则表达式模式的匹配项绑定到变量。