The model I used followed pretty much the same architecture as the NVIDIA model. I used a scaled down version while testing on my local machine, but I scaled it up when I trained it on the TPU. The data which the model was trained on was created by driving around in the Udacity Self-Driving Car Sim(collected by Zhenye Na). I initially thought I could use Cosine Annealing LR Decay, as I was most framiliar with it. However, when I trained the model, the validation loss went up and plateaued at around 4(yikes)! I reran the job about 5 times before I realized that something was up. I turned to the good souls of Reddit and asked for help. I literally put dropout after every single layer, but to no avail. Then I realized, “What if I try Piecewise Constant decay?” I tried this, and voila, the validation loss began to rapidly fall. After this, I decided that I have to implement the simulator for model inference.

For the first few hours of Day 3, I was trying to download the Udacity self-driving car simulator, to no avail. For some reason, Rosetta 2 wouldn’t work with the game, and I couldn’t get the outdfated version of Unity they used to work on my Mac. So, I looked at other implementations of similar algorithms, and looked at how they handled inference. I then modified this code to use my own model, and it worked.

The next day, I scaled up my model to match the specification set by NVIDIA’s paper. Then, I put the training code on a TPU(as with many of my projects, provided by the TensorFlow Research Cloud) and ran it. The train was really fast(about 7 minutes for 30 epochs if my memory is correct). In the end, the validation loss was very low, indicating success.

This project taught me a lot and was very fun to develop. It helped me learn more about behavior cloning and self-driving cars, but also refreshed my memory on TensorFlow and Keras(at this point, I hadn’t used those tools in over a year).

]]>`python3-venv`

from Deadsnakes(I think) and created a new environment. Now, pip finally worked and I could install Levanter. However, I ran into another problem. My system kept OOMing whenever I tried to finetune Phind 34b(we will get to Llama 2 later). I filed an issue on the Levanter repo, and turned off my computer for the night. When I woke up the next morning, I found that David Hall(the author of the repo) had suggested that I lowered the batch size. I lowered it down to 4(1*4 TPU devices) and it still OOMed. After a few weeks of back and forth, however, I was able to get it running and finetune Phind 34b to write Go code. Later, I was watching an episode of the Lex Fridman podcast, when I heard Yann LeCun(the guest speaker) say that the IT firm Infosys was finetuning Llama 2 70b on over 20 languages. This gave me the idea to do a similar thing on a smaller scale with the TPU. Later that day, I did just that, finetuning Llama 2 on over 4 languages. You can find my models here. I want to thank David Hall, Stanford’s CRFM research lab, the Llama 2 team, and the TensorFlow Research Cloud for making this possible.]]>- Marcus Morris Sr.
- 2028 Unprotected First Round Pick(via the Clippers)
- 2029 First Round Swap with the Clippers

- Lauri Markannen

Lauri Markannen averages 8.7 boards, helping to boost the mediocre rebounding numbers, and makes 3 3-pointers per game(with 39% accuracy).

- Marcus Morris Sr.
- Nic Batum
- Furkan Korkmaz
- 2028 Clippers Unprotected

- Karl-Anthony Towns

Karl-Anthony Towns grabs 8.8 rebounds per night, significantly boosting the team’s rebounding numbers. Also, he makes 2 3-pointers a game with 43% accuracy.

- Marcus Morris Sr.
- 2028 Philadelphia Second Round Pick

- Tim Hardaway Jr.

While this trade doesn’t help out much on rebounding, it certainly helps out on shooting. He makes 3 3-pointers a night on 36% from behind the arc.

- Paul Reed
- 2028 Clippers Unprotected First

- Grayson Allen

This will significantly improve the Sixer’s 3-point shooting by adding a member of the 50-40-90 club(as of the time of writing this) to the roster. He can also get (some) rebounds, bringing down 4 per game.

Whatever happens, I hope the Sixers at least make *a* move before the deadline. I haven’t really heard their names in trade rumors much, which scares me. If Daryl Morey thinks the Sixers can win it all with this roster, he thinks wrong. They may be able to beat the Bucks in the playoffs if they flame out in Doc Rivers fashion, but they won’t get past Boston. Also, if he thinks he can rebuild via expiring contracts, he is also wrong. Most of the cap space will be spent on resigning Maxey, and you will waste another MVP caliber season from Joel Embiid.

Anyways, Samir out.

]]>On this day, a random question popped in my head. “How does Wolfram|Alpha work?” I pulled out my phone and Googled that very question. However, many of them were talking about the Newton Raphson method, which I couldn’t understand very well. Others talked about how Matlab uses LAPACK, which will be an important part of the story later on.

I decided that the only way for me to understand how Wolfram|Alpha worked was to implement a *budget* version myself. Here is how the code worked:

- Split the equation across the = sign.
- Simplify each side
- Try to isolate x on the left side of the equation and the constants on the right side
- Divide both sides by the coefficient of x This solver worked fairly well for linear equations, but didn’t work for really anything else.

Solving linear equations was cool, but I wanted to be able to solve other kinds of equations. The quadratic and exponential solvers are fairly similar(the only difference being the equation parser). They simplified the equations and got the terms in their respective standard forms. Then they plugged those values into a formula to calculate the answer. This architecture is also the reason why I didn’t add solvers for polynomials of higher degrees. Cubic and Quartic formulas are way too complex to implement quickly(Quartic is the size of a whole wall, according to one of my friends).

After I implemented the solvers, I decided that I would need to implement more features. After all, Wolfram|Alpha doesn’t *just* solve equations. I started out by implementing the power rule(for differentiation), and its reverse(for integration). Then, I hard coded certain rules(the derivative of sin x is cos x) and used simple string replacement to execute them. After this, I still felt that I hadn’t coded enough(you will see that this is a common theme of the story). So, I implemented a Fourier Transform. After that, I turned to linear algebra, implementing transpose, vector scaling, and dot product functions myself, and deferring to LAPACK for Eigenvalues and vector addition. Implementing these Linear Algebra functions signalled a shift, from modeling Codegebra off Wolfram|Alpha to modeling it off Matlab.