Alex Gajewski

alex at alexgajewski dot me, @apagajewski

I'm one of the cofounders of Pantograph, an independent research lab working on building robots that amplify the agency of individuals.
Before that I was one of the cofounders of SF Compute, a market for compute.
Before that I was one founders of Exa, a generative search engine.
I also helped organize the first batch of AI Grant.
On the weekends, I sporadically host a textbook reading club called Cafe Calculate. Everybody reads their own thing, which I've found to be a good format.
I love talking to people working on startups and projects that they are excited by. If you'd like a somewhat eclectic set of opinions to bounce ideas off of, send me an email!

Some things I believe:

General-purpose robotics is going to have a transformative impact on the world, possibly even bigger than language models, and it's difficult to imagine everything that's going to change as a result of it.
- So much of the world is built assuming that most things around us are static.
- When there are cheap, generally intelligent machines that get deeply integrated into everyday life, a lot of things that have been static forever are going to become dynamic and flexible.
- I think it's important that this shift be driven intrinsically, by people in their own lives, rather than imposed by external forces.
- I'm most excited by technologies that amplify people's agency, rather than removing it. Open-source software, 3d printers, the internet as a source of information, all amplify what people can do. I want general-purpose robotics to fall into in this category as well.
There should be a lot more exploration, creativity, heterogeneity in the world. Optimization algorithms benefit a lot from terms that encourage novelty and diversity, and I think the real world is similar.
I'm excited about small groups of people working on unusual ideas in relative isolation, without worrying too much about what other groups are working on.
In the very distant future, I think inhabiting other stars will produce a lot more exploration in the ways that people organize themselves. The speed of light is a natural barrier to homogeneity.
Currently, a lot of aspects of the world need to be rethought in light of powerful deep learning systems.

Some speculations about the near future:

I think a lot more companies are going to start doing pre-training soon.
- I think the ideas are starting to diffuse out of the big labs, and there are a lot of second-order benefits to vertical integration, especially in unusual domains.
I'm worried about there being a pretty extreme compute crunch over the coming years.
- Nvidia is already a high double digit percent of TSMC's most advanced process.
- I expect inference demand for coding models to rise a lot. I think 10x is pretty conservative; 50x might be more realistic.
- It doesn't seem likely to me that there will be nearly enough new fabs produced to meet this demand.
I think access to compute is going to be a basic human right, like literacy, electricity, and the internet.
- I think it's possible a lot more individuals will start trying to set up their own private compute clusters.

Some things I think would be interesting for people to work on (if you're working on any of these things, I'd love to hear from you!):

Intelligent compilers
- Traditional compilers use static, hand-written code to optimize the code that gets passed into them.
- Language models are quite smart now. Might it be possible to use them to optimize input code in a much more flexible and powerful way?
- You may need to be able to formally verify that the resulting code produces the same output as the original code--this may also benefit from smart language models producing the verification artifacts
- The dream is to be able to do away with manually coded threads, processes, channels, vectors, files, anything at all having to do with the implementation of the program, and to only worry about expressing the output that you want
- A nice thought experiment here is, what is the shortest code that can express the desired output of the program?
- I'm particularly interested in this in the context of languages for deep learning, because many deep learning ideas are in fact extremely simple, even though it may take a lot of code to implement them performantly
Measuring positive externalities
- There are a lot of things in the world that produce value that they don't get to capture (nonprofits, open-source software, scientific research, the printing press).
- The free market only naturally incentivizes things that are able to capture large amounts of that value.
- If there were an effective way of measuring externalities (say, in terms of the number of dollars of GDP that something produces), it might be possible to create a new mechanism to directly incentivize positive externalities.
- This is a hard problem: you can't run the world forward in time twice, once having invented the printing press and once not inventing it, to see which world has a higher GDP.
- Reinforcement learning algorithms are somehow able to estimate credit assignment of outcomes back to specific actions, even while only being able to run a single rollout. So there's some hope that it's possible.
Decentralized compute markets
- I'm curious whether it's possible to set up a compute market that is fully decentralized.
- Some desiderata that I think are important:
  - Accelerator agnosticism: I think the buyer shouldn't need to worry about what accelerator their code is ultimately getting compiled down to. It should be up to the seller how to efficiently run workloads on the hardware they choose to buy.
  - Universality: There should be a rich enough set of primitives that most deep learning operations can be expressed.
    - StableHLO could be a good candidate for this?
  - Determinism: A buyer should get the same output if they run the same program twice, regardless of which seller ran it.
    - Numerical error is one potential obstacle here.
  - Permissionless supply: A seller should be able to start serving workloads without obtaining approval from any centralized authority.
- It's possible that in order to have these desiderata, you need to give up on privacy for the buyer.
  - This may not be the end of the world--there are a lot of cases where it's ok for machine learning workloads to be plaintext, e.g. scientific research, or contributing to open-source software.
Modality-agnostic reasoning
- They say that learning and search are the two things that scale.
- We know how to do learning in a domain-agnostic way, deep learning seems to be very flexible and to scale well anywhere that there is enough data for it.
- Right now, the best way to do search seems to be chain-of-thought, but this is pretty specific to language.
- Are there ways of doing search that are independent of the type of data that we are modeling?
- A very old example is MCMC, which if you squint is a modality-agnostic way of using inference-time compute.
Pixel-space language modeling
- Current video modeling methods are very bad at learning about language through pixels.
- Similarly, audio modeling methods are bad at learning language through waveforms.
- Humans are very good at doing both.
- What's missing from our models? Just more scale, or do we need new methods?
Ultra-long context
- How can we train models that can operate for years within a single context window?
- Similarly, how can we blur the line between in-context learning and gradient descent?
What is the right programming model for language models?
- Prompting is most common and cheapest, but there is a limit to the number of bits of information you can put in a prompt.
- On the other extreme, a full post-training program is very expensive, but it can put a lot of bits into the responses that a model gives in different situations.
- Is there something in the middle? How much can you specify about the function of a language model with $10k worth of post-training?

I wrote up these ideas in 2022, but I'll keep them up here in case anyone still finds them interesting