◄ 上一步目录下一步 ►

0301: 词法解析

string → tokens → struct

一条SQL首先要成一个字符串变成编程语言里的数据类型，然后才能后续的执行。比如：

select a,b from t where c=1;

要表示为：

StmtSelect{
    table: "t",
    cols:  []string{"a", "b"},
    keys:  []NamedCell{{column: "c", value: Cell{Type: TypeI64, I64: 1}}},
}

SQL比较类似英语，有一定的词汇和语法。计算机语言里的“词汇”一般叫做 token，在解析语法之前，可以先将字符串解析成 token，这个过程叫 tokenizer 或 lexer。

SQL 里的 token 可以分为以下几类：

关键字：select、from 等。
名字：表名、列名等。
符号：=;, 等。
数值、字符串等。

不同类别的规则不一样，每种 token 都有对应的函数实现。

语法解析器

大部分语法解析，就是从左到右消耗 token，并构建数据结构。所以需要记住当前 token 在字符串中的位置。

type Parser struct {
    buf string
    pos int
}

func NewParser(s string) Parser {
    return Parser{buf: s, pos: 0}
}

解析名字（表名、列名）

func (p *Parser) tryName() (string, bool)

要求：

忽略前面的空格。
开头是字母或_，后续是字母、数字、_。
成功返回 true，并增加 pos。
失败返回 false，pos 保持不变。

比如输入 Parser {buf: " hi ", pos: 0}，执行 tryName() 后，pos=3，返回 "hi"。

可以使用以下函数来实现规则：

func isSpace(ch byte) bool {
    switch ch {
    case '\t', '\n', '\v', '\f', '\r', ' ':
        return true
    }
    return false
}
func isAlpha(ch byte) bool {
    return 'a' <= (ch|32) && (ch|32) <= 'z'
}
func isDigit(ch byte) bool {
    return '0' <= ch && ch <= '9'
}
func isNameStart(ch byte) bool {
    return isAlpha(ch) || ch == '_'
}
func isNameContinue(ch byte) bool {
    return isAlpha(ch) || isDigit(ch) || ch == '_'
}

解析关键字

func (p *Parser) tryKeyword(kw string) bool

要求：

忽略前面的空格。
不区分大小写，匹配输入的关键字，若成功，增加 pos，返回 true。
否则，返回 false。
关键字之间必须由空格或符号分隔。

用以下函数来判断分隔：

func isSeparator(ch byte) bool {
    return ch < 128 && !isNameContinue(ch)
}

您正在阅读免费版教程，从第4章起只有简单的指引，适合爱好挑战和自学的读者。
可以购买有详细指导+背景知识的完整版。

◄ 上一步目录下一步 ►