cfgrammar/mod.rs
1#![allow(clippy::cognitive_complexity)]
2#![allow(clippy::many_single_char_names)]
3#![allow(clippy::new_without_default)]
4#![allow(clippy::unnecessary_wraps)]
5#![allow(clippy::upper_case_acronyms)]
6#![forbid(unsafe_code)]
7#![deny(unreachable_pub)]
8
9//! A library for manipulating Context Free Grammars (CFG). It is impractical to fully homogenise
10//! all the types of grammars out there, so the aim is for different grammar types
11//! to have completely separate implementations. Code that wants to be generic over more than one
12//! grammar type can then use an "adapter" to homogenise the particular grammar types of interest.
13//! Currently this is a little academic, since only Yacc-style grammars are supported (albeit
14//! several variants of Yacc grammars).
15//!
16//! Unfortunately, CFG terminology is something of a mess. Some people use different terms for the
17//! same concept interchangeably; some use different terms to convey subtle differences of meaning
18//! (but without complete uniformity). "Token", "terminal", and "lexeme" are examples of this: they
19//! are synonyms in some tools and papers, but not in others.
20//!
21//! In order to make this library somewhat coherent, we therefore use some basic terminology
22//! guidelines for major concepts (acknowledging that this will cause clashes with some grammar
23//! types).
24//!
25//! * A *grammar* is an ordered sequence of *productions*.
26//! * A *production* is an ordered sequence of *symbols*.
27//! * A *rule* maps a name to one or more productions.
28//! * A *token* is the name of a syntactic element.
29//!
30//! For example, in the following Yacc grammar:
31//!
32//! R1: "a" "b" | R2;
33//! R2: "c";
34//!
35//! the following statements are true:
36//!
37//! * There are 3 productions. 1: ["a", "b"] 2: ["R2"] 3: ["c"]`
38//! * There are two rules: R1 and R2. The mapping to productions is {R1: {1, 2}, R2: {3}}
39//! * There are three tokens: a, b, and c.
40//!
41//! cfgrammar makes the following guarantees about grammars:
42//!
43//! * Productions are numbered from `0` to `prods_len() - 1` (inclusive).
44//! * Rules are numbered from `0` to `rules_len() - 1` (inclusive).
45//! * Tokens are numbered from `0` to `toks_len() - 1` (inclusive).
46//! * The StorageT type used to store productions, rules, and token indices can be infallibly
47//! converted into usize (see [`TIdx`](struct.TIdx.html) and friends for more details).
48//!
49//! For most current uses, the main function to investigate is
50//! [`YaccGrammar::new()`](yacc/grammar/struct.YaccGrammar.html#method.new) and/or
51//! [`YaccGrammar::new_with_storaget()`](yacc/grammar/struct.YaccGrammar.html#method.new_with_storaget)
52//! which take as input a Yacc grammar.
53
54#[cfg(feature = "bincode")]
55use bincode::{Decode, Encode};
56#[cfg(feature = "serde")]
57use serde::{Deserialize, Serialize};
58
59#[doc(hidden)]
60pub mod header;
61mod idxnewtype;
62#[doc(hidden)]
63pub mod markmap;
64pub mod newlinecache;
65pub mod span;
66pub mod yacc;
67
68pub use newlinecache::NewlineCache;
69pub use span::{Location, Span, Spanned};
70
71/// A type specifically for rule indices.
72pub use crate::idxnewtype::{PIdx, RIdx, SIdx, TIdx};
73
74#[derive(Clone, Copy, Debug, Hash, Eq, PartialEq)]
75#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
76#[cfg_attr(feature = "bincode", derive(Encode, Decode))]
77pub enum Symbol<StorageT> {
78 Rule(RIdx<StorageT>),
79 Token(TIdx<StorageT>),
80}