<notes>
probabilities over n-grams (non-markovian)
State Space (S): A set of all possible, mutually exclusive states S:{s₁, s₂, ...}.
State Trajectory (σ): A sequence of states over time, σ = (s₀, s₁, s₂, ...),
where s[t] is the state at time t.
Predicates (L, R): A tuple of conditions, which must be implementable as functions which assign a state 's' to the codomain {true,false}
Vanilla Transition Probability: idiomatically, P(p|q).
formally: Given some state space S:
S[s=q] denotes the truth set of all states within S where q=true. (almost exclusively used to express 'equals some exact 1-tuple of states' e.g. state='11', state='banana'.)
S[s=p] denotes the truth set of all states within S where p=true. (also almost exclusively used to select a 1-tuple of states).
thereafter, P(p|q) denotes the probability within our state trajectory σ, at time t, that the successor state `s'=s[t+1]` intersects our 'left-hand' predicate truth set `S[s=p]`, from all starting states `s=s[t]` intersecting our right hand predicate truth set `S[s=q]`.
if we want to be terse, this reads like: Pr(s'∧S[s=p]|s∧S[s=q]).
(i hope that even a single example of a terse notation dissuades us from ever seriously using the terse notation again.)

Historicity-Augmented Transition Probability:
we want to describe predicates which make statements about sequences of states, such as 'there is a 2-tuple (0,11) in our state trajectory', or 'all state sequences containing both 1-tuple (banana) and 1-tuple (cheese)'.
this isn't very complicated!
starting from our state space definition s[t] ∈ S, we may describe augmented state spaces s'[t] ∈ S', where s'[t] = (s[t], s[t-1], ... s[t-m+1]).
each state s'[t] describes the tuple containing a m-length 'window' of all atomic states s'[t][m] sequentially preceding a particular state s'[t][0].
S', our state-space of history-tuples is blah blah necessarily a subset of all window-lengths M selectable within our sequence, or, tersely, S' ⊆ Sᵐ.
this reformulation lets us consider predicate conditional probabilities over N-grams as if they were markovian state transitions between unique states s'[t], s'[t+1] within a *stupefyingly* and intractably big state space Sᵐ, where, of course, S ⊆ S' ⊆ Sᵐ.
a preferred notation emerges:
R'(s'[t]) denotes the application of the 'right hand' predicate to our 'history'-tuple s'[t].
L'(s'[t]) denotes the application of the 'left hand' predicate to our 'history'-tuple s'[t].
without substantial differences in the entailed calculations, we may now write a convenient and tidy P(L'[t+1]|R'[t]) to denote semantics like:
"given that the current history window s'[t] satisfies our predicate R, what is the probability that history window covering s'[t]∪s[t+1] satisfies our predicate L?"
this is not very useful by itself yet, even if it is compact and composable.

Rudimentary Temporal Logic:
let us presume some predicate Φ.
X Φ ("Next"): denotes the predicate Φ is true in the next state.
F Φ ("Finally"): denotes the predicate Φ will be true at some point in the trailing sequence of states.
G Φ ("Globally"): denotes the predicate is true for all consequent states.
Φ U Ψ ("Until"): denotes the predicate Φ must remain true until predicate Ψ becomes true.
a suggested notation appears:
P(X L | R) entails "given the current state satisfies predicate R, what is the probability the successor state satisfies predicate L?"
this will feel familiar insofar as it is exactly the 'standard language modeling objective' from the GPT-1 paper.

P(F L | R) entails "given the current state satisfies predicate R, what is the probability that any future state satisfies predicate L?".
this will feel familiar to the topic of reinforcement learning!

some fun further extensions:
P(F (L[1] ∧ X L[2]) | R): "given the current state satisfies predicate R, what is the probability of hitting any future state which satisfies predicate L[1], which is then immediately followed by a state satisfying predicate L[2]?
this will feel *very* familiar to the topic of answer-verification reward learning, as we consider predicates like 'L[1]: state matches <answer>', 'L[2]: state is the correct answer for the question asked in states selected by predicate R', and 'R: state matches training dataset item'.

P(G(R → F L) | s[0]): "given some starting state s[0], what is the probability that it is always true that if the system enters a state satisfying predicate R, the system will eventually hit a state satisfying predicate L?"
</notes>