Show HN: Next-token prediction in JavaScript – build fast LLMs from scratch What inspired this project today was watching this amazing video by 3Blue1Brown called "But what is a GPT?" on Youtube ( https://www.youtube.com/watch?v=wjZofJX0v4M - I highly recommend watching it). I added it to the repo for reference. When it clicked in my head that "knowing a fact" is nearly synonymous with predicting a word (or series of words), I wanted to put it to the test, because it seemed so simple. I chose JavaScript because I can exploit the way it structures objects to aid in the modeling of language. For example: "I want to be at the beach", "I will do it later", "I want to know the answer", ... becomes: { I: { want: { to: { be: { ... }, know: { ... } } }, will: { ... } }, ... } in JavaScript. You can exploit the language's fast object lookup speed to find known sentences this way, rather than recursively searching text - which is the convention and would take forever or not work at all considering there are several full books loaded in by default (and it could support many more). Accompanying research yielded learnings about what "tokens" and "embeddings" are, what is meant by "training", and most of the rest - though I'm still learning jargon. I wrote a script to iterate over every single word of every single book to rank how likely it is that word will appear next, if given a cursor, and extended that to rank entire phrases. The base decoder started out what I'll call "token-agnostic" - didn't care if you were looking for the next letter... word... pixel... it's the same logic. But actually it's not, and it soon evolved into a text (language) model. But I have plans to get into image generation next (next-pixel prediction), using this. Overall the concepts are similar, but there are differences primarily around extraction and formatting. Goals of the project: - Demystify LLMs for people, show that it's just regular code that does normal stuff - Actually make a pretty good LLM in JavaScript, with a version at least capable of running in a browser tab https://ift.tt/9e1LUdk April 11, 2024 at 02:57AM
Show HN: Next-token prediction in JavaScript – build fast LLMs from scratch https://ift.tt/0GW13EA
Related Articles
Show HN: Ragdoll Studio (fka Arthas.AI) is the FOSS alternative to character.ai https://ift.tt/NdkptXLShow HN: Ragdoll Studio (fka Arthas.AI) is the FOSS alternative to cha… Read More
Show HN: Kftray – manage and share multiple K8s port forward from your menu bar https://ift.tt/YXLn32QShow HN: Kftray – manage and share multiple K8s port forward from your… Read More
Show HN: FaceLandmarks – ARKit Face Mesh Vertex Tool https://ift.tt/qLIEMD1Show HN: FaceLandmarks – ARKit Face Mesh Vertex Tool Hey everyone. Fac… Read More
Show HN: Autonomous open-source AI environment https://ift.tt/KAyJHlFShow HN: Autonomous open-source AI environment https://ift.tt/dMqFEOo … Read More
Show HN: Libmui is a macOS Classic widget lib for Linux https://ift.tt/fqUvXa0Show HN: Libmui is a macOS Classic widget lib for Linux Not sure if I … Read More
Show HN: Turtle graphics with only 6 commands: C, F, R, S, [, ] https://ift.tt/pwEhG1KShow HN: Turtle graphics with only 6 commands: C, F, R, S, [, ] https:… Read More
Show HN: Turn your work into a multiplayer adventure https://ift.tt/Cea0c9EShow HN: Turn your work into a multiplayer adventure I was on the eter… Read More
Show HN: Citronote – open-source Markdown notes app https://ift.tt/A1ZW0d7Show HN: Citronote – open-source Markdown notes app I created an app t… Read More
0 Comments: