2-Param Cartpole, TTMH Talk, Trusting the Creator, When Is A Thing That Thing
Week 6, day 3 at Recurse F2’25.
Busy and amazing day!
- programming puzzle group
- first time I’ve gotten totally stumped! (more below)
- walking coffee chat
- much delightful musing on “the 9%”, habits and preferences of the upper-middle social class
- built cartpole heuristic solutions
- 1-param (almost) and 2-param (complete) (more below)
- attended the ML/RL in games group, cartpole rodeo edition
- it was awesome how many approaches we had! PPOs with different networks, KNN + generic algorithm action sequence, and two different heuristics
- rehearsed my Talk to Me Human (TTMH) presentation
- oh boy do I make too many slides and talk too much always. self-timing is so helpful.
- gave TTMH presentation at the Wednesday non-technical talks
- heart so warmed by everyone’s reception and interest in trying the game! (more below)
- discussions on presented and related topics
- what makes an art or activity maintain its essence? (more below) plus, architecture design patterns.
Duplicates in n + 1 integers within [1, n]
Constraints: constant space, no array modification. Best solution: linear time.
This one has only very clever solutions. I didn’t get it.
The linear solution (O(n)) I’m aware of uses Floyd’s cycle detection algorithm. It treats the values in the array as if they were indices of an imaginary linked list. A duplicated number means there’s a cycle in the linked list, and Floyd’s cycle detection algorithm will find it. After finding this cycle, one must find the cycle’s entrance, which can cleverly be done by resetting one pointer and continuing to loop with the other.
An O(n log n) solution that’s barely more intuitive is to do a binary search on the numbers, each time summing the entire array versus the index of the middle point. In other words, in each step, you see which half of the array contains more (or fewer) numbers than expected that are >= the middle index, isolating which half must contain the duplicate.
Both, of course, heavily use the fact that the numbers fall in [1, n].
My favorite non-solution that doesn’t use constant space—instead, O(n) bits—is to keep a 1-bit index of which numbers have been seen, and return whenever you try to flip a bit that’s already 1.
I’m not going to stress about learning this one too deeply because it’s a “clever trick,” and I’ve only got the time and headspace to memorize clever tricks I personally enjoy thinking about.
Heuristic Cartpole
In cartpole, as an observation, you get four pieces of state:
cart_position: float, x coordinate in (-4.8, 4.8), center = 0cart_velocity: float, x velocity coordinate in (-inf, inf), still = 0pole_angle: float of radians, in (-1/2 pi, 1/2 pi), upright = 0- 90 degrees ~= 1.57 radians
- 12 degrees ~= 0.21 radians
pole_angular_velocity: float, in (-inf, inf), still = 0- (actually not sure the exact units b/c physics. maybe rad/s?)
Your only actions are to move left or right. (Surprisingly, you can’t stay still!)
The simplest thing you might try is to move the cart left/right purely based on whether the pole is leaning left/right. This results in an increasing oscillation pattern until failure.
Attempt 1: Move the cart left/right to try to correct where the pole is currently leaning. Results in overcorrection, which oscillates till failure.
I tried including random actions that tweened their randomness based on how far the pole was from center. The idea was that if the pole was vertical, choose random actions, and as it gets to be tipped further over, prioritize correcting it. This did not work at all, and just added noise.
Attempt 2: Taking the oscillating movements from attempt 1, and trying to alleviate overcorrection by introducing randomness near the balanced state. This only makes the results worse.
I asked Claude for ideas on a better heuristic. It suggested what I think amounts to integrating one step forward in time, and moving the pole based on its expected next position. This is the classic p' = p + v*Δt Explicit Euler method. This works extremely well.
Attempt 3: Predict where the pole will be by stepping forward in time based on its the pole’s angular velocity. This works to balance the pole, but the cart moves out of bounds extremely slowly because we’re totally ignoring its position.
The one issue is that because this doesn’t take the cart’s position into account, it slowly slides out of bounds. We can fix this simply by adding a region where the pole is nearly vertical (the “deadband”), and taking cart-modifying actions there when appropriate. The first thing I tried was intuitive: if the cart is on the right, move it left (& vice versa).
Attempt 4: Trying to correct the carts position while the pole is nearly vertical has the opposite effect of sliding it further off the direction it’s already headed.
To my surprise, to make the cart move where you want, you actually trigger it to move in the opposite direction (in the deadband region), and the balancing (outside of the deadband region) will then correct it. For example, if the cart is to the right of center, you have it move right. Then, it will naturally move left.
My guess for what is happening here is like when turning a bicycle. If you want to turn left, what you actually do is steer right, then lean left. Steering the opposite direction (right) puts the weight out towards the direction you want to go (left), and you lean into that direction (left) and then end up turning there (left).
Attempt 5: By correcting the cart’s position the opposite direction you’d expect, it stays balanced in the center. (Cutoff at 1k steps, but it survives the full 5k episode.)
Here’s the complete 2-parameter policy that solves cartpole:
def policy_predictive_deadband_demo(state: np.ndarray) -> int:
"""2-param cartpole solution. (actions: 0 = left, 1 = right)"""
VELOCITY_WEIGHT = 0.1
DEADBAND = 0.01
cart_p, _cart_v, pole_angle, pole_ang_v = state
effective_angle = pole_angle + VELOCITY_WEIGHT * pole_ang_v
if abs(effective_angle) < DEADBAND:
return 1 if cart_p > 0 else 0 # steer reverse!
return 0 if effective_angle < 0 else 1
From my source repo.
As a final followup, I at first thought I was maybe being silly increasing the max steps from 500 to 5000 for the challenge. But the need for the 2-param solution that takes into account the cart’s position only really comes into play after 500 steps! (I also stand by that letting the pole fall all the way down is funnier and a worthy modification.)
Talk to Me Human Interest
So many folks interested in playing! I am thrilled!
Also, booked travel home to Seattle over the holidays.
It was also validating to hear the game’s core idea resonate with people. This isn’t “LLMs bolted onto something,” it is a game mechanic you simply couldn’t do without LLMs.
LLMs & Slop vs Trusting the Creator
One interesting idea someone told me is that they generally have very low trust for LLMs and no interest in slop. So having some personal connection to the creator of something—like seeing their presentation—is essential for them to be curious about actually trying an LLM-powered application.
One anecdote in support of this is that before attending Recurse, a friend shared TTMH—plus discount codes for a free copy—on the Recurse chat. There was barely any interest. But after an in-person presentation, there was much more.
This isn’t just personal connection, it’s a combination of:
- personal connection
- pitch (contextualization)
- demo
- deal (free copy)
- call for help (wanting more players, and help playtesting the Windows+Steam version)
Regardless, it’s a good reminder that going in person and talking about TTMH in more places would be helpful.
One big thing that’s been a barrier for me is that the games industry is super anti-AI—almost as a universal, blanket, there must be absolutely no AI involved in any way ever during any part of the development or deployment kind of way.01 The mechanic of TTMH is clearly LLM-native, in that without LLMs, the game doesn’t exist. But the fact that I did use AI-generated images for the background and characters does still give me pause. I fear being literally booed off of a stage. This is another lesson: have a budget! Even a few thousand dollars to hire an artist would mean the AI content would be limited only to the core mechanic, and not artwork, which is a much touchier subject.02
When Is A Thing That Thing?
A presenter mentioned how wushu (crudely: martial arts dancing) gets critiqued for not being really combat-oriented. More performance than fight. I asked, why isn’t the response from wushu practitioners just, “who cares? We do it because we like it.” They replied, “Then what actually makes wushu wushu? Is break dancing wushu?”
It’s easy to dismiss this as semantics. (What is a sandwich? When is a repaired boat no longer that boat?) But for human culture, semantics matters.
They connected that to other ideas. In my lifetime, programming has gone from being written in assembly (Roller Coaster Tycoon) all the way to prompting in natural language (vibe coding). Is it still programming? What does it mean to program?
Another interesting connection is the “soulless” modern American workout classes. There always feels like some kind of vaguely implied purpose, and half-hearted gestures at spiritual components. But at least for me, while I do find them fun, there’s been a fundamental emptiness to them, like an activity shell missing a meaning core. Contrast with, for example, any competitive sport, or even a solo activity like climbing, where training is in service of the thing.03
Footnotes
I’m empathetic to the general sentiment. But being early on in the acceptance curve, it’s a rough time to be doing what I’m doing, because most folks will dismiss projects before even looking at them. ↩︎
My counterpoint to this, of course, is that TTMH was supposed to be a quick experiment to gauge interest, not scoped to a full game project with years of development, hired artists, and an actual (small) budget in the thousands of dollars. So it’s hard to make the call of when the project has organically crossed that line and now deserves it. ↩︎
Identity wise, too, sports evolve and have their own crises. Climbing competitions have moved from emulating outdoors climbing to featuring acrobatic jumps. ↩︎