cfgrammar/
mod.rs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#![allow(clippy::cognitive_complexity)]
#![allow(clippy::many_single_char_names)]
#![allow(clippy::new_without_default)]
#![allow(clippy::unnecessary_wraps)]
#![allow(clippy::upper_case_acronyms)]
#![forbid(unsafe_code)]
#![deny(unreachable_pub)]

//! A library for manipulating Context Free Grammars (CFG). It is impractical to fully homogenise
//! all the types of grammars out there, so the aim is for different grammar types
//! to have completely separate implementations. Code that wants to be generic over more than one
//! grammar type can then use an "adapter" to homogenise the particular grammar types of interest.
//! Currently this is a little academic, since only Yacc-style grammars are supported (albeit
//! several variants of Yacc grammars).
//!
//! Unfortunately, CFG terminology is something of a mess. Some people use different terms for the
//! same concept interchangeably; some use different terms to convey subtle differences of meaning
//! (but without complete uniformity). "Token", "terminal", and "lexeme" are examples of this: they
//! are synonyms in some tools and papers, but not in others.
//!
//! In order to make this library somewhat coherent, we therefore use some basic terminology
//! guidelines for major concepts (acknowledging that this will cause clashes with some grammar
//! types).
//!
//!   * A *grammar* is an ordered sequence of *productions*.
//!   * A *production* is an ordered sequence of *symbols*.
//!   * A *rule* maps a name to one or more productions.
//!   * A *token* is the name of a syntactic element.
//!
//! For example, in the following Yacc grammar:
//!
//!   R1: "a" "b" | R2;
//!   R2: "c";
//!
//! the following statements are true:
//!
//!   * There are 3 productions. 1: ["a", "b"] 2: ["R2"] 3: ["c"]`
//!   * There are two rules: R1 and R2. The mapping to productions is {R1: {1, 2}, R2: {3}}
//!   * There are three tokens: a, b, and c.
//!
//! cfgrammar makes the following guarantees about grammars:
//!
//!   * Productions are numbered from `0` to `prods_len() - 1` (inclusive).
//!   * Rules are numbered from `0` to `rules_len() - 1` (inclusive).
//!   * Tokens are numbered from `0` to `toks_len() - 1` (inclusive).
//!   * The StorageT type used to store productions, rules, and token indices can be infallibly
//!     converted into usize (see [`TIdx`](struct.TIdx.html) and friends for more details).
//!
//! For most current uses, the main function to investigate is
//! [`YaccGrammar::new()`](yacc/grammar/struct.YaccGrammar.html#method.new) and/or
//! [`YaccGrammar::new_with_storaget()`](yacc/grammar/struct.YaccGrammar.html#method.new_with_storaget)
//! which take as input a Yacc grammar.

#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};

mod idxnewtype;
pub mod newlinecache;
pub mod span;
pub mod yacc;

pub use newlinecache::NewlineCache;
pub use span::{Span, Spanned};

/// A type specifically for rule indices.
pub use crate::idxnewtype::{PIdx, RIdx, SIdx, TIdx};

#[derive(Clone, Copy, Debug, Hash, Eq, PartialEq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum Symbol<StorageT> {
    Rule(RIdx<StorageT>),
    Token(TIdx<StorageT>),
}