Alex Gajewski
alex at alexgajewski dot me, @apagajewski
Right now I'm the ceo of SF Compute, where we're working on automating the scaling of neural networks. I generally think AI will turn out best if there are as few barriers to making a state of the art model as possible, and if there are lots and lots of companies training large models of different types.
Before that, I founded Metaphor, where we trained a big contrastive model over a billion pages on the internet to make a neural search engine. I also helped out with the first batch of AI Grant.
(If you're interested in funding an open source GPT-4, let me know!)
Here are some things I think would be cool to build (let me know if you'd like to build any of them :D):
- a more abstract programming language for deep learning
- most languages are designed to make things like binary operations short, but now we mostly care about matrix/vector operations
- it doesn't take a lot of words to tell someone the main idea of a deep learning paper, but it sure takes a lot of code
- can you build a compiler to go from an abstract description to the specific details?
- e.g. you usually don't care about the details of a u-net architecture, you just want to set the number of parameters and for the compiler to make reasonable choices for the rest
- AlphaZero for math, theorem proving with no human data
- in 2015, Go was too hard of a problem for then-current deep RL methods, so we were forced to invent MCTS to solve it
- today, theorem proving is in a similar place, so it's a good problem to sharpen our methods against
- AI characters that authors can put months/years of work into
- right now, the only input you provide to an AI character is a prompt, which just limits how interesting the characters can end up
- you should be able to put way more information into them, either via fine-tuning or just longer contexts
- maybe they should be goal-directed with RL, trying to get you to say a certain thing or to move the story in a certain direction
- really these are just a new form of literature
- voice to voice models that think in sound
- right now people are doing voice with a whisper->gpt4->elevenlabs pipeline
- really it should be one big end to end model, so that the part that does the thinking knows how you said something
- you want it to be able to interrupt you, talk over you, etc.
- write a history book automatically
- it feels like language models are almost at the point that they could do the research, organize it, and put it into book form
- it's possible you want to fine-tune the whole process on actual history books with RL, so it learns how to do the latent research behind existing books
- easier if you have models with super long context windows
- language models with true billion token context windows
- feels like it's probably possible
- you need to store 1B keys and values in GPU memory, this is about 16TB with hidden dim 4096, easily fits if you're training on 2048 A100s
- then you just need to figure out a way to select which ones to attend over for any given token that doesn't cost too much computation
- also need datasets with sequences 1B tokens long (a book is maybe 200k tokens)
- could construct these sequences by concatenating a bunch of related documents together, so the model has an incentive to find the parts of the context that are most useful for its current prediction
- language models with some kind of tree search?
- lots of people are trying to build variants of language models that do some kind of tree search in the solution space of a specific problem, usually programming or math
- I think it would be cool to try to do this in a very general way for arbitrary language modeling, where you just have some kind of learned value function that's trying to predict whether the generated sequence is real or not
- end-to-end chip design
- basically an optimization problem, just one that's sort of difficult to write down
- you want to be able to do it as one big RL task, where the model is just trying to make some distribution of programs run as fast as possible
- might even want it to output a gate layout directly
- general purpose robotics
- robotics is getting very
good!
- this was trained in simulation and then transfered zero shot to the real world
- general approach might be to put a ton of effort into making a super realistic simulator with thousands of different pretraining tasks, and scaling up a lot
- you might also need to make thousands of physical robots to train on if the simulator isn't good enough
- may help to pretrain on the internet, probably language and video
- robotics is getting very
good!
- AGI self driving
- if we want self driving cars to be able to go anywhere in any conditions, we need them to be able to generalize
- probably the thing to do is to pretrain on giant text and video datasets, and then specialize them to driving
- people who work on self driving say you'll never get the level of reliability you need that way, but I think you probably will, someone just needs to try it
- (I don't think Cruise/Waymo are doing this, so it could be a startup)
- a big predictive medical model
- train on all the sequences of patient records, scans, tests, treatments, and outcomes
- this is effectively a very big offline RL dataset
- then like a decision transformer, you can predict which treatment will lead to the best outcome for any given patient
- the hardest part is probably getting access to all the medical records, but if you can coordinate everyone, you probably do a lot better than human doctors
- train on all the sequences of patient records, scans, tests, treatments, and outcomes
- are there methods that look very different from deep learning but have the same general shape?
- neural networks are basically just a class of universal function approximators plus a way of optimizing them (SGD)
- the basic building block is just alternating layers of linear and nonlinear functions
- you could imagine applying SGD to different classes of differentiable functions that don't look like MLPs
- evolutionary algorithms are maybe another class of optimizer, but they're usually pretty inefficient
- you would think there would be lots of classes of functions and optimizers that can scale, and neural networks aren't necessarily the most compute efficient