Blog

My “Oh Shit” Moment for LLMs

I’ve been playing with LLM’s for about a year plus, but I’ve held the opinion, they are great but unclear what the use will be. Over the past few weeks, I’ve stumbled upon my “oh shit, this is huge.”

Why a year delay?

Let’s start with the context on how I got here, then below I’ll discuss what changed my mind.

Github Co-Pilot

It’s definitely the best product of 2023. It’s worth every penny. For non-programmers, would you use your cell phone without auto complete? I mean you could, but you’re much faster with it. It’s definitely sped up my development, and generally I know what I need to do, the slowdown is typing it out.

New Job, New Language, New Frameworks

In September, I started a new role and returned to being an individual contributor. With this new role, I joined a company that was using a different tech stack than what I was comfortable with. It was Go, Gin, React, Mongo, Faktory to Node, TS, Nest.js, React, Temporal, Postgres. Compared to past experiences joining a team, ChatGPT made me get that much faster. I continue to use it to ask “my dumb questions” or save time in thinking through specific syntax. Ex. How do I do a group by with these two columns for XYZ.

It’s definitely sped up my development and onboarding experience.

LangChain

Feb + March 2023 - I started using LangChain for a new product at my previous place of employment. It’s overrated - It’s trying to do it all. Give LangChain the problem and it will solve it. I think this is the false approach. Most teams should be leaning on LLMs in more scoped area of their code base. It should be thought of as another tool in the toolbox, not the only one.

I think in the next few years, either LLMs drastically approach AGI and all tasks will be completed by them. Or LangChain doesn’t look anything like it did in 2023. More programmers will be building small features with fine tuned models, feedback loops built into their product and deterministic code to train those models.

Crypto + Hype

I have hesitation when everyone runs into an area, says “it’s the next big thing”, then tells you how to make money with it. Sound familiar? Same thing with crypto. I noticed a lot of people doing crypto things from 2017 - 2022, suddenly pivoted to “AI”. However, if you asked them how Tensorflow worked, they would look at you with a blank face.

I try to keep myself detached from these cycles of “hype” because you stop thinking in first principles. Ex. Why do I work for a Crypto company? Because USDC is the easiest way to move money across borders, and the world will continue to get more globalized. Those companies need to manage their books.

Same thing should apply with “AI” and really LLMs. Why would this be 10x better? What specific problem is it solving? Are people incentivized to use it? (Especially without telling them they’ll get rich or have FOMO).

What changed?

Browse the Internet

It’s become a tool in my scripting toolbox. What do I mean? Often I solve problems by writing one off scripts. Ex 1. Fixing customer data. Ex 2. Trying to automate a way any simple task. I encountered a problem were I wanted to read API documentation.

Typically, if I pulled the HTML code down, it would be massive in terms of text. It would contain extra styling and javascript. It would also struggle to understand the page as it’s presented. I decided to use the GPT-4 vision API.

The GPT-4 Vision API operates like me - it sees the page like me. It can parse and read with the same visual context as me.

The next step was to pair it with navigating the web pages. That’s were I slightly modified BrowserGPT to help with this. BrowserGPT takes a prompt, converts it into Playwright code, and performs an action (manipulates the page, or navigates).

My final step was to combine the two - I can now send English instructions for navigating the internet, and the code can “see” the page like I do. Really powerful for scripting (Here is a demo).

Fine Tuning

I write a lot - mostly private but I write like I speak, and therefore, I can write quickly. I think LLMs are impressive for general knowledge. The idea of fine tuning on a specific dataset really excites me. If I write a well put together and short tutorial on how to do something - it may help unlock a lot of value for another dev. I love this idea of leverage. LLMs help bridge the communication gap. They may not want to understand it in the same way, or with the full explanation. They may want follow ups. There’s the constant feedback loop on content which is exciting.

Second, I think one of “my superpowers” is breaking down problems into manageable chunks. This is shared with a lot of senior engineers. What I foresee is another way to automates “tasks” in a bigger project. “Task” tends to be met with a script / block of code.

Conclusion

End of the rant - just seeing even more leverage coming to my world. Watch this talk: Making AI accessible with Andrej Karpathy and Stephanie Zhan. Then follow up with this one: [1hr Talk] Intro to Large Language Models.