Fast substring search, tree serialization order, playing cartpole by hand
Week 5, day 4 at Recurse F2’25
substring search is naively O(mn), but can be done in O(m + n) with O(1) space
Say:
- len(haystack) = n
- len(needle) = m
Naively, O(mn):
- for each character in n as start
- check each character in m matches
Fancier techniques precompute information into tables, jump around cleverly, and/or use hashing.
Exhaustive resource:
http://www-igm.univ-mlv.fr/~lecroq/string/index.html
Wikipedia has a whole page, here’s the table: https://en.wikipedia.org/wiki/String-searching_algorithm#Single-pattern_algorithms
python’s implementation has a short note about their approach: https://github.com/python/cpython/blob/main/Objects/stringlib/fastsearch.h#L5-L8
… and a much longer writeup with lots of examples: https://github.com/python/cpython/blob/main/Objects/stringlib/stringlib_find_two_way_notes.txt
This is a cool instance where Python’s built-in approach:
- looks the size of your inputs
- guesses the fastest algorithm to use for your case
- implements sophisticated algorithm(s) to do so
- and provides a writeup of how they did it
If I ever wanted to implement one, it seems like Rabin-Karp might be the simplest. It uses a rolling hash:
- O(m) - hash needle
- O(m) - hash first m of haystack
- O(n) - slide window of size m over haystack (really O(n - m))
- O(1) removing last character from hash and adding new one in
- O(m) on hash match, check all characters to verify (ideally just once)
So worst case O(nm) time if bad hash and lots of collisions, though ought to be O(n + m).
O(1) space, as each hash is a single are number.
tree serialization order matters
Assume you are serializing a binary tree, and you render the nulls. You can do pre-order, in-order, or post-order traversals. Which one(s) will give you an unambiguous representation of the tree?
| order | unambiguous serialization with nulls? |
|---|---|
| pre | yes |
| in | no |
| post | yes |
This was surprising to me. And I still have a rough intuition of parts of it, but not of the whole thing.
It makes sense that pre- and post-order traversals make unambiguous serializations, because they can be used to unambiguously represent pre/post-fix math expressions.
- In an expression, operators are internal nodes, and numbers are leaves, which tells you when to pop in/out.
- In a tree, nulls tell you where leaves are, which also tells you the structure.
With in-order traversal, a node’s position depends on its subtrees, which makes serialization ambiguous. Both of the following two trees have an in-order traversal of [null, A, null, B, null]:
Tree 1: B Tree 2: A
/ \
A B
So while I understand why pre and post should work, I still only have a vague intuition why exactly they do, as well as why in-order does not.
(The difference between saying true words which correctly explain something, and having the actual intuition click, remains interesting for me.)
What if you don’t have nulls? Pre- and post-order traversals no longer unambiguously serialize a tree. But if you have pre + in, or post + in, you can do it. There are even leetcode problems for this:
w/o nulls: pre + in → tree
https://leetcode.com/problems/construct-binary-tree-from-preorder-and-inorder-traversal/description/
w/o nulls: in + post → tree
https://leetcode.com/problems/construct-binary-tree-from-inorder-and-postorder-traversal/description/
(I may try these tomorrow if I still have the enthusiasm.)
What about pre + post (still no nulls)? It’s not enough. When a node has one child, you can’t tell where it is.
Tree 1: 1 Tree 2: 1
/ \
2 2
For both: pre: [1, 2], post: [2, 1].
cart pole
Me attempting to play by hand, with modified environment.
summary of tweaks / learnings:
- increased angle before failure
- drew x guides
- drew score
- track turn
- save best score
- subclassed env & registered custom
- learned about registering, rendering, randomness, the basic API, and the bogus
reward_threshold - made humans play. it is hard.
human performance
| who | score |
|---|---|
| Chris | 277 |
| Dan | 93 |
| Alex | 83 |
| me | 188 |
| Ben | pretty good, wasn’t tracking yet |
(this was actually with calling render() twice accidentally. (once explicitly, because I didn’t realize step() calls it automatically.) it is WAY harder not doing that.)
registering custom env
subclassing an env (cart pole) to tweak it. (was just editing the package contents directly.)
gymnasium envs are registered in:
# .venv/lib/python3.12/site-packages/gymnasium/
envs/__init__.py
like:
register(
id="CartPole-v0",
entry_point="gymnasium.envs.classic_control.cartpole:CartPoleEnv",
vector_entry_point="gymnasium.envs.classic_control.cartpole:CartPoleVectorEnv",
max_episode_steps=200,
reward_threshold=195.0,
)
register(
id="CartPole-v1",
entry_point="gymnasium.envs.classic_control.cartpole:CartPoleEnv",
vector_entry_point="gymnasium.envs.classic_control.cartpole:CartPoleVectorEnv",
max_episode_steps=500,
reward_threshold=475.0,
)
reward_threshold is strange.
reward_threshold
reward_threshold doesn’t appear to actually do anything! It’s just metadata that’s saved into the spec (the EnvSpec), and is available for querying.
Claude was very emphatic after its little grepping:
- Does NOT stop training
- Does NOT cap rewards
- Does NOT send signals anywhere
- Does NOT trigger any behavior
I guess it makes sense as metadata an environment creator would provide to agent builders to communicate success. For cart pole it seems trivial, since the max episode steps (500) is so closely related to the reward threshold (really ought to be about 500) because it gets +1 per step, but for other environments it’s not necessarily obvious.
Actually, there’s a bug. This reward_threshold is not correct, because they’ve implemented sutton_barto_reward as a command line option, which gives 0 per step and -1 on failure. Any agent with any performance will always get a -1 reward. (It’s interesting that the total reward for Sutton & Barto’s formulation isn’t a sign of whether the agent has learned!) In other words, having a command line option for changing the reward isn’t compatible with a fixed, upfront, static reward_threshold. (Good thing it… doesn’t do anything.)
Other arg notes:
- it’s nice that
order_enforcedefaults toTrue, to make sure you aren’t calling things wrong
randomness
how is randomness controlled? the base Env’s reset() takes a seed, and then sets two attrs
# in Env (base environment class)
self._np_random, self._np_random_seed = seeding.np_random(seed)
this is stateful; their docs say:
Therefore, :meth:
resetshould (in the typical use case) be called with a seed right after initialization and then never again.For Custom environments, the first line of :meth:
resetshould besuper().reset(seed=seed)which implements the seeding correctly.
the original cartpole accesses this with
# in CartPolEnv (original)
self.state = self.np_random.uniform(low=low, high=high, size=(4,))
… which is actually implemented as a property
# in Env (base class)
@property
def np_random(self) -> np.random.Generator:
"""Returns the environment's internal :attr:`_np_random` that
if not set will initialise with a random seed.
Returns:
Instances of `np.random.Generator`
"""
if self._np_random is None:
self._np_random, self._np_random_seed = seeding.np_random()
return self._np_random
What I’m trying to figure out is: by just always calling reset like this:
for episode in range(episodes):
env.reset()
# ...
… am I doing it wrong? Both (a) calling it multiple times, (b) with no explicit seed ever?
It seems like this is fine:
- it’s normal to repeatedly call
reset()- when they say “never again,” they only meant passing the seed
- not passing a seed just means a seed will be organically chosen, so the run won’t be repeatable
if you call it with the same seed each time, you get the exact same starting observation
# episodes 1, 2, 3
Starting observation: [ 0.01823519 -0.0446179 -0.02796401 -0.03156282]
Starting observation: [ 0.01823519 -0.0446179 -0.02796401 -0.03156282]
Starting observation: [ 0.01823519 -0.0446179 -0.02796401 -0.03156282]
...
… if you instead call the same seed the first time, then None on subsequent episodes, you get the same sequence of starting observations on the episodes of each run:
# episodes 1, 2, 3
Starting observation: [ 0.01823519 -0.0446179 -0.02796401 -0.03156282]
Starting observation: [-0.03240941 0.03120945 0.0423345 -0.02234256]
Starting observation: [ 0.03197546 0.03898927 0.00129705 -0.02550354]
...
# next run, episodes 1, 2, 3
Starting observation: [ 0.01823519 -0.0446179 -0.02796401 -0.03156282]
Starting observation: [-0.03240941 0.03120945 0.0423345 -0.02234256]
Starting observation: [ 0.03197546 0.03898927 0.00129705 -0.02550354]
I got into all this because there’s a nondeterministic flag when registering the environment, which made me wonder whether CartPole actually implemented deterministic seeded “randomness.” And it does.
rendering
Rendering is basically:
- either don’t render, or
- draw with pygame, then
- render to window
- render to pixels & return them
def render(self):
if self.render_mode is None:
return
# (omitted draw with pygame)
if self.render_mode == "human":
# update pygame window
return None
else:
np_pixels = # pygame render
return numpy_pixels
all methods?
custom envs have a pretty minimal API!
note that:
- the first return of
step()is ‘observations’, but for now that’s just the next state. step()callsrender()
def CustomEnv(Env):
def __init__(): ...
def reset(seed) -> state: ...
def step(action) -> state, reward, term, trunc, info: ...
def render(): Optional[pixels]...
def close(): ...
vector env
TBD, but seems important for speed. there’s CartPoleVectorEnv(VectorEnv)