Ian's blog | entropy.observer

AI SuperAgents: (ab)using WASM for LLM function calls - Part 1

Ian Rumac — Sat, 07 Sep 2024 10:23:40 GMT

"why are you even doing this, it doesnt make sense, half of this stuff isn't even stable, you're just torturing yourself" - Every sane person

Bleeding edge technology is fun.
You know what's also fun? Smacking two bleeding edge technologies together.

One can never know what's gonna happen - in your imagination, they will work magically in unison, bringing your vision to life with sounds of an angelic choir as it finishes compiling - in reality, you'll probably spend your nights lying in bed, thinking about the 136 tabs of github issues, half-finished documentation and unexplained source code you have open in your browser.

But if you're a novelty chasing ADHD squirrel like me and now you're remembering all those fun nights spent in these rabbit holes - oooh, do I have a treat for you. So grab your coffee, follow me into this rabbit hole and let's smash some shiny rocks together.

So, how bout those LLM's?

In the last few years, LLM's have penetrated every aspect of our world. From your Google recommendations and contents of your spam folder, to the eternal AI hypestorms on your Twitter feed, everything seems to be succumbing to the machines.

But to give machines the agency they need to interact with our world, we need to provide them with a way to do it. And how we're doing it has mostly based on the concept of LLM function calling - providing LLM's a set of possible functions to execute, together with argument descriptions. And it works just fine. We write some code, give the function description to an LLM, it returns us arguments, we call the function, say abracadabra and poof we got agency.

But what if we could make it even better?

What if we could let LLM's write their own functions? Let alone, what if those functions could be reused across conversations, constantly expanding the capabilities of our machines, providing us with a giant library of possible interactions they can have with the real world?

Oh man, wouldn't that just be great (and completely, utterly, chaotic)?

Well, what if I told you there is a perfect way to do that by using another bleeding-edge edgy technology, one that will surely raise some eyebrows at your next local meetup?

Enter WASM: the ASM of the future

If you've been paying attention to the web world the last few years, you've probably heard of WebAssembly. If not, I'll keep it quick - it's a new binary format & compile target, made so you can use your good ol' compiled languages such as C, Rust or Java in the browser instead of being constrained to just Javascript.
Basically, like your CPU runs assembly instructions, the browser has a virtual machine that runs WebAssembly instructions. Mind blowing, I know.

But unlike other tries at this (I'm looking at you, Java applets), this one actually seems properly orchestrated and executed, supported by browsers and developed in the open, standardised under W3C's supervision.

And while it was announced nearly 9 years ago, it recently started picking up quite a lot more steam, with the development of WASI and the Component model.
These allow developers to write code that communicates with the outside world (in an easier way), providing common interfaces (WASI) and allowing developers to define their own (component model). As the preview of this has been maturing lately and is quite usable now, it might just be the perfect time to start exploring all the potential the technology offers.

WASM - a language for the machines.

With the rise of the Agent and function calling paradigms, a lot of useless frameworks have popped up to provide agents with different capabilities - enabling one to chain multiple agents, provide them with function calls and feed them knowledge. Be it searching the web, writing to files or looking up wikipedia, they expand the capabilities of the machines, enabling complex, emergent behaviour to evolve.

Still, they are quite limited: they are bound to a specific language or framework, they have to be written by the developers and they are rarely shared between projects - meaning we all have to repeat the same boring API calls thousands of others developers already wrote.

But what if it didn't have to be so?

What if instead of having to write these functions ourselves, we could have the machines write them for us, on a per-need basis? What if we enabled complete automation of agents and their behaviour, providing them with whatever they need, whenever they need it, by letting them write their own code?

What if we could build a self-building machine?

With WASM, we can build just that. Let's take a look how:

The basic idea is just like your average LLM function call - take the message, have LLM decide if it requires an external function, call the function with the provided arguments your LLM extracted from the message. But, it comes with a twist (or two):

We can write functions in any language we want, as long as it compiles to WASM and obeys our function contract

Which enables another, way more entertaining twist:

If the function doesn't exist, we can have another LLM agent write it, compile it, execute it and return the calls to the original agent.

Why is this so entertaining?

Because in theory, it provides us with the ultimate "self-building" machine, allowing our LLM to expand it's own capabilities as it goes - want it to scrape a website? Sure, it can write a function for that. Order an Uber? Give it an API key and watch it go. Create another LLM for you? Why not.

Note: "Technically", I should add. With the current state of LLMs, it will probably order an Uber to the middle of the Amazon and spend the budget of an average Balkan country while trying to scrape the same website 26 billion times. Still, quite entertaining.

But the best part - we get to play with shiny, new technology. And how can one say no to that?

So take out your editors, sharpen your Rust skills and let's build the self-building machine!

"Recursion, 4k, gallery, masterpiece, dali, de chirico, surrealism, detailed, light sky" - 2024, the artist is a machine

Building the building blocks: WASM Components

Let's start with the most important thing - building out an interface for our WASM component. We'll keep it quite simple, providing us with a few fields we need to discover and use the function. And we'll be writing it in WIT.

WIT is assembly's interface format, used to define interface for Components and Worlds. A component is just a modular piece of code obeying a contract, while a world is a contract for the "world" the component executes in, defining the interfaces the component exports (exposes to the outside world), and the interfaces that the component imports (uses from the outside world).

As a format itself, WIT is quite simple and understandable - if you've used any higher level language the syntax should be intuitive immediately. We got your god ol'bools, floats, ints, chars and the fan favorite strings. For more advanced needs, it supports not only list, tuple, option and result , but also record , variant, resource and anything else you might need. For now, we'll be doing it simply and just push JSON around, since it's the end-all-be-all internet format that LLMs can easily recognise (if you want to leave a comment like "oh no JSON, muh performance", here is your chance).

So let's define our interfaces - spin up a project folder and create a file at wit/function.wit in which we will create the basic interface .

package superagent:functions;

interface function {
  
}

First, we'll need to know which function to invoke and what it's arguments are, so we'll create a record called metadata that exposes us some function information. Besides name, it will also include a description so that we can search among the functions for it, and we'll list the arguments too so we know how to invoke it.
To be able to retrieve that data from a component, we'll also create a function called meta which will return our record.

package superagent:functions;

interface function {

  record metadata {
    name: string,
    description: string,
    arguments: string
  }

  // Retrieves metadata of the function
  meta: func() -> metadata;
  
}

Then, to call the function itself, we'll need an invoke method.
This method will take in the arguments formatted as a JSON string and return a result which can either be a JSON string containing results or an execution-error which will contain an error message.

package superagent:functions;

interface function {

  record metadata {
    name: string,
    description: string,
    arguments: string
  }

  record execution-error {
      reason: string
  }

  // Used to invoke our function
  invoke: func(input: string) -> result;

  meta: func() -> metadata;

}

And to wrap it all up together, let's define a world called host that exports our function interface:

package superagent:functions;

interface function {
   // .. all the code  
}

world host {
  export function;
}

And voila, just like that our WebAssembly component is defined and ready to be implemented. To implement it, we'll be using the most ~~annoying~~ beloved language in the world, Rust.

First, we'll be creating a sample component that will implement the interface and that we will use to test our code. While we can do that with bindgen and wasm-tools, we'll be doing it "the right way" - by using cargo component to take out the busywork, so go ahead and do a:

cargo install cargo-component
cargo component new --lib sample

This will create a new component project in the sample directory.
Now go ahead and create a wit folder, then symlink the wit file inside (so we don't need two of them):

cd sample && mkdir wit && cd wit
ln -s ../../wit/function.wit function.wit

Now, a small diversion:

Since component model is written for WASI preview 2, and the Rust compiler only supports WASI preview 1, we need to adapt the compiled code by using an adapter - we can do this automatically with cargo component, but first we need to download wasi_snapshot_preview1.reactor.wasm adapter from the release page.
Then, we can open up Cargo.toml and add this:

[package.metadata.component]
adapter = "wasi_snapshot_preview1.reactor.wasm"

This way, code can be written to target WASI preview 2 but it's adapted so it can be used in places that support WASI preview 1 only, giving us backwards compatibility with a lot of real-world WASI implementations.

Now that we got the basics set-up, we can run cargo component build inside our sample folder and we'll see it generate a bindings.rs file with all the WIT records and interfaces code created for us. Only thing left to do is implement it, and for this sample, we'll implement a simple one called web_pinger which will do a mock check if an endpoint is running:

#[allow(warnings)]
mod bindings;

// Generated contracts
use bindings::{
    exports::superagent::functions::function,
    exports::superagent::functions::function::Guest,
};

//Generated records
use crate::bindings::exports::superagent::functions::function::{ExecutionError, Metadata};

//Create the component which will implement our WIT interfaces
struct Component;

//Implement the "guest" code, aka your webassembly component

impl Guest for Component {
    //Mock function invocation
    fn invoke(input: String) -> Result {
        return Result::Ok("{ up: true}".to_string())
    }

    //Get function description
    fn description() -> Metadata {
        Metadata {
            name: "web_pinger".to_string(),
            description: "Checks if a website is up by pinging it"
                          .to_string(),
            arguments: "{ \"endpoint\": String }".to_string(),
        }
    }
}

Now, if we run cargo component build, our component should be built successfuly, and we'll end up with a WASM module file at ./target/wasm32-wasi/debug/sample.wasm.

Congratulations, your first WASM component has been built 🎉

Assembling the Rube Goldberg-von Neumann machine

Now that we have set up the guest code, it's time for the host code. Go through your favorite ritual of setting up a Cargo project in your project folder and let's add some basics - first off, you'll need your WASM runtime of choice.

Now choosing one is a daunting task in itself - I'd recommend wasmtime by Bytecode Alliance and it's what I'll be using here, so let's add that to our cargo file, together with WASI support.

[dependencies]
wasmtime = { version = "18.0.1", features = ["component-model"] }
wasmtime-wasi = { version = "18.0.1", default-features = true }

Next, we'll also add two more libraries - the wit-component library for handling Components and a wit-bindgen-rust that will take our WIT interface and generate the glue code in the background.

wit-component = "0.201.0"
wit-bindgen-rust = "0.20.0"

For now, we'll leave the LLM part first and focus on WASM, since the LLM part is quite simple - create an Agent contract, implement it for your model/provider of choice, prompt tune a bit. But don't worry - we'll get to that later.

Let's figure out the bare minimum to run a WASM component:

Take a compiled wasm binary
Instantiate a Component out of it
Start a WASM engine
Run the component inside the engine with provided arguments
Return the result

So open up your main.rs and start writing. First, we'll be lazy and just create a lazy static instance of a WASM engine, then define our necessary functions:

lazy_static::lazy_static! {
    static ref ENGINE: Engine = {
        let mut config = Config::new();
        
        // For easier debugging
        config.wasm_backtrace_details(WasmBacktraceDetails::Enable);
        
        // Enables component model
        config.wasm_component_model(true);

        let engine = Engine::new(&config).unwrap();
        engine
    };
}

fn run_function(arguments: String, 
                component_binary: Vec[u8]) -> String {
    let component = build_wasm_component(component_binary);
    let mut instance = create_instance(component);
    let res = execute_function("invoke", "arguments", instance)
}

To build a WASM component from binary, we'll load the .wasm file into a byte array, adapt it to preview2, then use Component::from_binary to create it. Don't worry, it's quite simple and the wit-component crate is here to support it - you just need to download the adapter files and load them in, so let's do that:

//Include the WASI Preview 1 Adapter
const ADAPTER: &[u8] = include_bytes!(concat!(
    env!("CARGO_MANIFEST_DIR"),
    "/wasm-sample/wasi_snapshot_preview1.reactor.wasm"
));


fn adapt_wasm_output(wasm_bytes: &[u8],
                    adapter_bytes: &[u8]) -> Result, Error> {
    let component = ComponentEncoder::default()
        .module(&wasm_bytes)
        .expect("Cannot encode module")
        .validate(true)
        .adapter("wasi_snapshot_preview1", &adapter_bytes)
        .expect("Cannot encode adapter")
        .encode()
        .expect("Cannot encode components");

    Ok(component.to_vec())
}

To actually be able to use the classes defined in your WIT file from the host code,
we need to use a bindgen macro to generate them from WIT file. This will help us generate glue code that binds our WASM functions with our Rust ones using the defined contract. So add this to the beginning of your file and point it at the wit:

bindgen!({
    path : "wit/function.wit",
    world: "host",
});

Now, we can create our component using wasmtime , so let's implement that build_wasm_component method:

fn build_wasm_component(bytes: &[u8]) -> Component {
 let component = adapt_wasm_output(bytes, ADAPTER).unwrap();
 
 Component::from_binary(&ENGINE, &component)
     .expect("Cannot create component")
}

After creating a component, we're ready to move on to the next step - creating an actual instance of it. To create an instance, we need a place for it to actually live, something where it can store it's variables, functions, memory et al. That something is called a Store. So let's create a Store and a WasmState which we will store inside. To create a WasmState , let's open wasm_state.rs .

Inside, we'll create a basic struct containing two things:

WasiCtx to provide a basic WASI implementation
a ResourceTable to access resources by reference

extern crate wasmtime;

use wasmtime::component::{ResourceTable};
use wasmtime_wasi::preview2::{WasiCtx, WasiCtxBuilder, WasiView};

pub(crate) struct WasmState {
    ctx: WasiCtx,
    table: ResourceTable,
}

impl WasmState {
    pub(crate) fn new() -> Self {
        let ctx = WasiCtxBuilder::new().build();
        let table = ResourceTable::new();
        Self { ctx, table }
    }
}

To access the table and the context, we'll also need to implement the WasiView trait:

impl WasiView for WasmState {
    fn table(&mut self) -> &mut ResourceTable {
        &mut self.table
    }

    fn ctx(&mut self) -> &mut WasiCtx {
        &mut self.ctx
    }
}

Phew, that was a lot of stuff - but finally we're ready to create the component, so let's go into the create_instance function. We'll first create a Linker, which links together host functions and instances. Then we'll link our WasmState into it and create the instance by passing in the Store it will be using and the Component we are instantiating to the linker, which should bind it all together and give us the living instance of ourComponent.

fn create_instance(store: &mut Store, 
                  component: Component) -> Instance {
    let mut linker = Linker::new(&ENGINE);
    preview2::command::sync::add_to_linker::(&mut linker)
        .expect("Cannot add to linker");
    linker.instantiate(store, &component)
        .expect("Cannot instantiate component")
}

Having the instance of our WASM program, only thing left to do is run the function itself, so let's create that execute_function method. To do that, we need to get the exported interface from our component instance and find the function we need. Then, we can invoke it with the provided arguments and receive a classic Rust Result.

fn execute_function(mut store: Store, instance: &mut Instance,
                    name: &str, args: &str) -> Result {
                    
    let mut exports = instance.exports(&mut store);
    let mut interface = exports
        .instance("superagent:functions/function")
        .expect("Cannot find interface");
        
    //Get the function by name
    let func = interface
        .typed_func::<(String,),(Result,)>(name)
        .expect("Cannot find action");
    drop(exports);
    
    //Call the function
    let res = func.call(&mut store, (args.to_string(), ))
        .expect("Function execution failed").0;
        
    //Remove the return from WASM memory
    func.post_return(&mut store)
        .expect("Cannot post return to store");
    res
}

And that's it - our WASM runner is ready. To test it, we can use the module we've build before - add it/symlink it to your project root and load it in:

fn main() -> Result<(), Error> {
    let component = build_wasm_component(GUEST_RS_WASM_MODULE);
    let mut store = Store::new(
        &ENGINE,
        WasmState::new(),
    );
    let mut instance = create_instance(&mut store, component);
    let res = execute_function(store, &mut instance, "invoke", 
    "{\"endpoint\":\"google.com\"}");
    match res {
        Ok(result) => {
            println!("Result: {}", result);
            Ok(())
        }
        Err(e) => {
            panic!("{}", e.reason)
        }
    }
}

Now, if you hit Cargo run, you should see the result being outputed:
Result: { up: true}

Congratulations - you have created your first WASM function and runner!

Now that the baseline is done, we can continue with the juicy bit - making the AI build it's own functions - but let's leave that for the next part of this blog post, it's getting too long and my coffee is getting too cold.

(Note: This post has been sitting in my shelf for the last few months, so some stuff might be out of date - don't worry, the declared versions here still work and the standards are still the same)

Sorting 400+ tabs in 60 seconds with JS, Rust & GPT3: Part 2 - Macros & Recursion

Ian Rumac — Tue, 07 Mar 2023 16:59:15 GMT

So, considering the response on the last part, I have a feeling I should add a preface to explain a few things about this post/series:

- It's not about "sorting" as in the algorithmic function of sorting - fun idea tho.
- It's not about GPT writing/optimising sorting functions - also a fun idea.
- It's not really in 60 seconds, the title is a jab at the common "in 60 seconds" trope
- It's mostly just a tagalong journal of an adventure where I try to solve my problem via over-engineering, with all the fun quirks, complexities, deep dives and scope creeps that come with building software.

So now that we got that out of the way, let's focus on what really matters:
Abusing GPT3 for fun and no profit.

In the last post, we built the user-facing part of our chrome extension that will help us fix our tab hoarding habit. We wrote some HTML, scribbled some JS and researched mysteries of the Chrome API. This time, we'll dive into the hottest language on the block right now - Rust.

I don't think I need to explain what Rust is. Even if you've lived under a rock you've probably heard of Rust - the development community is praising it into high heavens - it has the speed of C, the safety of Java and the borrowing system with the helicopter parenting skills of an AH-64 Apache attack helicopter.

But - the syntax is neat, the performance is awesome, macros are cool and even tho it's mostly strict about memory, it still gives you access to raw pointers and let's you go !unsafe .

So, to get a feel of the language, let's try and have some fun with it.

We'll build a simple service that will take our tab collection, simplify it a bit, talk to the OpenAI's API and - hopefully without hallucinations - parse the response into something our extension can use. On the way we'll encounter some obstacles, from having too many tabs and wasting money to Silicon Valley waking up and hammering the OpenAI API into oblivion.

Our service will be quite simple - we'll expose one method /sort to which we will POST our tabs and existing categories. To build it, we'll be resorting to Axum framework, allowing us to easily start up a server with a /sort endpoint. And to deploy it we'll use shuttle so we can easily spin up a Rust server without swimming through the sea of AWS configs, writing Procfiles or building docker images.

We'll even use it to scaffold our project, so let's start by installing it.
First, we'll need cargo, the rust package manager - if you don't have it installed, follow the steps here. Second, we'll need a shuttle account - don't worry, you can just 1-click signup with Github there - no need to fill out forms.

Now, open up the ol' terminal and hit cargo install cargo-shuttle && cargo shuttle login followed up with cargo shuttle init after you've authenticated.

Follow the instructions to set the project name and location, and in the menu choose axum as your framework. This will scaffold a new axum project as a library, with shuttle as a dependency.

Our folder should now look like this.

├── Cargo.lock
├── Cargo.toml
└── src
    └── lib.rs

It's quite a simple structure - We have cargo.toml, which is the rust version of manifest.json or package.json. It contains metadata about your package, it's dependencies, compilation features and more. The cargo.lock is just a hardcoded list of dependencies specified, ensuring consistent builds across environments.

Our main server code will reside inside src/lib.rs. Let's look at it while it's still fresh and beautiful:

use axum::{routing::get, Router};
use sync_wrapper::SyncWrapper;

async fn hello_world() -> &'static str {
    "Hello, world!"
}

#[shuttle_service::main]
async fn axum() -> shuttle_service::ShuttleAxum {
    let router = Router::new().route("/hello", get(hello_world));
    let sync_wrapper = SyncWrapper::new(router);

    Ok(sync_wrapper)
}

A few things of note here:

No main method - since this projects is marked as a [lib]rary, there is no predefined entry point needed.
The router - the "entry point" for your Axum service. Requests are routed through here and the code is pretty self-explanatory - you pair up the route to the function handling it, i.e, our supercoolservice.com/hello would return a simple "Hello, world!" text.
The SyncWrapper - wraps our router object, ensuring it's safe to access across different threads.
#[shuttle_service::main] - this is a rust macro - think of it as a more powerful version of annotations if you know what those are. It let's you write code that writes code - but that's the lazy explanation. Uuh.. I think we need a quick diversion here.

A quick diversion into the magical realm of macros

Magical realm of macros, hieronymus bosch - 2023, the artist is a machine. First one is a conference talk, the second is pair programming, the third is the rabbit hole you find yourself in after getting into macros.

Now, before we get into the macros, I gotta preface this with a warning: this is not a 100% explanation of macros and how they work in [insert your favorite language here]. For that, there exist hundreds of books, guides and articles.

But for those random readers who stumbled here and don't want to read "monad is a monoid in the category of endofunctors" style articles explaining macros, we'll have a quick dip into the beautiful rabbit-hole of macros.

So let's imagine we're working in a imaginary language called Bust.

Bust is this new cool language the whole twitter is buzzing about and they say it's going to be the language of the metaverse AI web4 apps. But, since it's a new language, it's still early and there isn't many libraries - for example, there is no JSON serialisation libraries yet, so you gotta write all of the code for that manually. So every time you create a struct you have to also write a bunch of serialisation code for it too. Like:

struct ReallyBigModel {
   id: String,
   name: String,
   isReal: Bool,
   ...
   stuff: AnotherBigModel
   }


impl ToJson for ReallyBigModel {
    fn toJson() -> String {
          return mapOf { "id" to id, 
                "name" to name,
                "isReal" to isReal,
                ..., 
                "stuff" to stuff.toJson())
              }.toJson() 
        }
}

Annoying, isn't it?
Nobody wants to write this much boilerplate every day.

But one day, you read in the latest changelog it now supports this new thing called macros. There are many types of macros, but in Bust macros are these special methods you can define that consist of two things:

The macro attribute
The macro function

The attribute is like a mark you can put on other code.
Imagine it being a big red X over classes or methods. So when your compiler is doing it's compiling, if it stumbles upon a function with a big red X over it's head, it knows it should call your macro function.

The macro function receives the code that is marked with the attribute, decides what to do with it and then returns new code to the compiler which it then integrates back where the marked function was.

So if in our example, we made a toJson macro, we could add toJson attribute above any struct and it would write that code for us, so the above code would turn into:

#[toJson]
struct ReallyBigModel {
   id: String,
   name: String,
   isReal: Bool,
   ...
   stuff: AnotherBigModel
}

And what would our macro look like?

It would be a function that takes in the code marked with it (represented as tokens) and returns a new code that will replace it.

#[toJson]
fn addToJsonTrait(input: TokenStream) -> TokenStream {

    let tree = parseIntoAST(input)
    let nodes = ast.data.asStruct();
    let name = tree.identity
    
    // Get all the children that are properties
    // Map them into format: $name to name
    let properties = nodes
    			.filter((child)=>child.isProperty)
    			.map((property) =>
     			"\"${property.name}\" to ${property.name}")
        		.joinToString(",\n")
                       

    // Write the toJson trait body
    let body  =  quote! { //this is also a kind of macro!
                    impl ToJson for #name {
                        fn toJson() -> String {
                                return mapOf {
                                    #properties
                                     }.toJson();
                                   }
                               }
                 	}
					
    return body.intoTree().intoStream()

}

Note: This is Bust, an imaginary language. Every language has it's own macro implementation and this is just a simplified representation of one so the article doesn't get excessively long.

So now, when our compiler arrives at a class marked with #[toJson], it will call the addToJsonTrait method, pass it the code for the class and wait until it returns the new code before it continues compiling.

And just like that, we saved a ton of time by using a macro function and can now be the productive Bust developer we always wanted to be!

Now, don't get too excited - this is just an imaginary implementation.
There is a lot to know about macros and I would suggest you to get deep into the rabbit hole - rust itself has a few different types of macros, it's one of the reasons people love Lisp so much, there are hygenic and non-hygenic macros, different types of expansions, and a lot more magic hiding away in the deep.

So now that we got that out of the way, let's get back into building our API.

The POST office

We'll hide the simple magic of our service behind the /sort POST method, so delete that hello world and replace the router with one handling the /sort request - Router::new().route("/sort", post(sort_items)) and a sort_items method that will handle the request:

async fn sort_items(Json(payload): Json)
                                       -> impl IntoResponse {
 (StatusCode::OK, Json("ok")).into_response()

}`

The method will receive a Json wrapper of our request structure and will return a implementation of IntoResponse trait which our server knows how to handle. Specifically, we'll be returning it in the tuple format of StatusCode,T which the server knows how to transform into an appropriate response.

One more thing we need to implement is our request data structure. So instead of having them lay around in the same file, let's pop open a new file called models.rs in src folder and create some basic definitions.

We'll need the SortRequestPayload which is the wrapper we will receive. It should contain a list of categories and items, so we'll need some structures for them too - Category and Item, so let's add those too. And we'll need a list of categories with items, so we can have categories with belonging items to return and a wrapper for them. Also, we'll add an ErrorResponse so we know where the problem is.

//in models.rs

pub(crate) struct SortRequestPayload {
    pub(crate) categories: Vec,
    pub(crate) items: Vec,
}

pub(crate) struct Category {
    pub(crate) id: usize,
    pub(crate) title: String,
}

pub(crate) struct Item {
    pub(crate) id: usize,
    pub(crate) title: String,
}

pub(crate) struct CategoryWithItems {
    pub category_id: usize,
    pub category_name: String,
    pub items: Vec
}

pub(crate) struct Categories {
    pub categories: Vec
}

pub(crate) struct ErrorResponse {
    pub message: String,
}

But, we got one problem - we need our structures to be easily (de)serialisable from/into json - for that, we will use a library called Serde and use it's macros ( similar to the macro we constructed before) so open up your cargo.toml file and add serde and serde_json as a dependency:

serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

Now, we can mark our structs with serde's #[derive(Deserialize)] macros so the framework knows how to deserialize the received JSON into our structs.

//in models.rs

#[derive(Deserialize)]
pub(crate) struct SortRequestPayload {
    pub(crate) categories: Vec,
    pub(crate) items: Vec,
}

#[derive(Deserialize)]
pub(crate) struct Category {
    pub(crate) id: usize,
    pub(crate) title: String,
}

#[derive(Deserialize)]
pub(crate) struct Item {
    pub(crate) id: usize,
    pub(crate) title: String,
}

#[derive(Deserialize)]
#[derive(Serialize)]
pub(crate) struct CategoryWithItems {
    pub category_id: usize,
    pub category_name: String,
    pub items: Vec
}

#[derive(Deserialize)]
#[derive(Serialize)]
pub(crate) struct Categories {
    pub categories: Vec
}

#[derive(Serialize)]
pub(crate) struct ErrorResponse {
    pub message: String,
}

With this done, we can dive back into our code.
Let's examine our plan:

1. Get the items
2. Assign items to categories
3. Slice the prompt into chunks
4. A recursive sort:
    4.1. Take existing categories and a chunk, turn them into a prompt
    4.2. Ask OpenAI to sort it
    4.3. Deserialize the response
    4.4. Add to existing categories
    4.5. While chunks remain, back to 4.1
5. Return the result

And structure it into methods:

//in lib.rs

...

fn create_chunks_for_prompting(items: Vec) -> Vec

fn sort_recursively(
	sorted_categories: Vec,
        remaining: Vec>) -> Result

fn build_prompt(items: Vec,
		categories: Vec) -> String

fn prompt_open_ai(prompt: String) -> Result

Also, we'll need our prompt, so let's try out something like this - we tell GPT3 it will receive a list of items and give it the format and the embed the list. Then, we describe it the valid JSON format to return and pass in the existing categories. In the end, we tell it to return them to us in the valid JSON format. Hopefully, it will adhere to the JSON format and not hallucinate it, but we'll fine-tune that in later posts. For now, it seems like specifying the valid JSON format close to the end of the prompt and mentioning a "valid JSON format" in the end keeps it grounded quite nicely.

You will receive list of items with titles and id's in form of [title,id].
Based on titles and urls, classify them into categories, by using existing categories or making new ones.

Tabs are:
[$tabName, $tabId].

Valid JSON format to return is:
{ "categories": [ { 
    "category_id":"id here",
    "category_name": "name here", 
    "items":[tab_id here] } 
]}.

Existing categories are: 
$categories

A new more detailed list of categories (existing and new) with items, in valid JSON format is:

Sounds good to me!
Let's divide it up into constants we can use inside our code.


const PROMPT_TEXT_START: &str = "You will receive list of items with titles and id's in form of [title,id].
Based on titles and urls, classify them into categories, by using existing categories or making new ones.";

const PROMPT_TEXT_MIDDLE: &str = "\nValid JSON format to return is:
{ \"categories\": [ { \"category_id\":\"id here\", \"category_name\": \"name here\", \"items\":[tab_id here] } ]}.
Existing categories are:";

const PROMPT_TEXT_ENDING: &str = "A new more detailed list of categories (existing and new) with tabs, in valid JSON format is:";

Finally we can get into our sort_items method and start filling it all out. First, we'll take ownership of our data and split it into chunks:

let items = payload.items;
let categories = payload.categories.iter().map(|it| {
    CategoryWithItems {
        category_id: it.id,
        category_name: it.title.to_owned(),
        items: Vec::new(),
    }
}).collect();

let prompt_slices = create_chunks_for_prompting(items_with_indexes);

Why chunks? Because if we just add all of the items to the prompt, our prompt size could be well over 4096 tokens - which is what the model we'll be using supports as the maximum length for prompt and completion. So we need to find a way to split it into a suitable size and have some buffer for the completion too - we'll leave a 50% buffer for it, leaving our prompt size at 2048.

To achieve that, our create_chunks_for_prompting function will need to do two things:

Count the number of tokens in our base prompt
Count the number of tokens in the data we send to the API
Calculate the amount of chunks we need by splitting the size of the total tokens with 2048 minus our hardcoded prompt size.

According to the OpenAI documentation, we can guesstimate that a token is the size of about 4 characters. Now, there are a lot of different ways to count tokens, and to do it properly, we would have to do a bit more than just split the length by 4 - the best way would be to use the Rust tokenizers crate and their GPT2 tokenizer. But, since that leads us down another rabbit hole, we're gonna skip it for now and just gonna do a simple trick - the split_whitespace method which will give us an approximation of token length.

fn create_chunks_for_prompting(items: Vec) -> Vec> {
   
    //tokens in our data
    let json_size = serde_json::to_string(&items).unwrap()
        .split_whitespace()
        .collect_vec()
        .len();
    
    // get the size of our hardcoded prompt

    let hardcoded_prompt = format!("{a}{b}{c}",
                                   a =String::from(PROMPT_TEXT),
                                   b = String::from(PROMPT_TEXT_APPEND),
                                   c= String::from(PROMPT_TEXT_ENDING));
    
    let hardcoded_prompt_size = hardcoded_prompt
        .split_whitespace()
        .len();

    //find the number of chunks we should split the items into
    let chunks_to_make = json_size / (2048 - hardcoded_prompt_size);
    
    //split the vector up into N vectors
    let chunk_size = items.chunks(items.len() /
                                      (if chunks_to_make > 0 {
                                      chunks_to_make
                                      } else { 1 }));
                                      
    //return the list of chunks
    return chunk_size.map(|s| s.into()).collect();
}

Now, let's get down to our build_prompt function.
To build our prompt, we need the list of items to be sorted and the existing categories. We'll take the list of items and format! it to a string in the form of [title,id]. Then, we'll turn the categories into JSON and use the format! macro to combine it all into a single prompt.

fn build_prompt(items: Vec,
                categories: Vec) -> String {

    //map items into [title,id] then join them all into a string
    let items_joined = items.iter().map(|item| format!(
                                        "[{title},{id}]",
                                        title = item.title,
                                        id = item.id))
                                .collect()
                                .join(",");

    let categories_json = serde_json::to_string(&categories).unwrap();
    
    format!("{prompt}\n{tabs}{middle}{categories}\n{ending}",
            prompt = String::from(PROMPT_TEXT_START),
            tabs = items_joined,
            middle = String::from(PROMPT_TEXT_MIDDLE),
            categories = categories_json,
            ending = String::from(PROMPT_TEXT_ENDING))
}

Now, to actually send that prompt to OpenAI, we'll need a HTTP client.
For that, we'll be using the reqwest crate - it provides us with a high level HTTP client with simple async functions we can use to talk to OpenAI API and has a JSON feature which allows us easy serialization/deserialization. So let's add it to our Cargo.toml file:

[dependencies]
...
reqwest = { version = "0.11", features = ["json"] }

Using this, we can build our HTTP client via the good ol' builder pattern.

let client = Client::builder()
    .http2_keep_alive_timeout(Duration::from_secs(120))
    .timeout(Duration::from_secs(120))
    .build()
    .unwrap();

But, if we built the client inside our prompt_open_ai function, we will be creating a Client instance for each request we make, so let's instead create a dependency and add the client code into our sort_items function, then pass it down as na argument into the sort_recursively function and prompt_open_ai functions. This way, we'll only use one instance of the HTTP client per one /sort call and our prompt_open_ai function can focus only on actually calling the API and giving us the result back.

So let's build a simple POST call and see how we can receive it's Result.
To keep things clean, we'll create a separate module inside our structure - modules are containers for your code (akin to packages), enabling you to create some separation between different areas of your code. Create a new folder called openai and two new files in it:

a mod.rs for our code
a models.rs for our models

Open up the models.rs and add the structs we need to communicate with our OpenAI Completion API:


use serde::{Deserialize, Serialize};

#[derive(Serialize)]
pub(crate) struct AskGPT {
    pub prompt: String,
    pub model: String,
    pub max_tokens: usize,
    pub stream: bool,
    pub temperature: usize,
    pub top_p: usize,
    pub n: usize,
}

#[derive(Deserialize)]
pub(crate) struct Completion {
    pub model: String,
    pub choices: Vec,
}

#[derive(Deserialize)]
pub(crate) struct Choices {
    pub text: String,
    pub index: usize,
}

And in the mod.rs we can build our prompt_open_ai method, with the POST request which will send our newly created AskGPT model to their /completions endpoint.

Now, there's a few important fields here - the self-explanatory prompt field, the model which lets us choose which model will do the completion (at the time of writing, text-davinci-003 is the best performing one for this task), the max_tokens which we'll set to 4096 (the max, d'oh), the n which controls the number of responses and temperature which is a way to tell it which probabilities to consider - the higher it is, the more random the completion might seem - we'll use 0, so our output is less random.

Note: For this part, you'll need your OpenAI API key, which you can find here.

async fn prompt_open_ai(prompt_txt: String,
                        client: &Client) -> Result {
    let token =  String::from("YOUR_API_KEY_HERE")
    let auth_header = format!("Bearer {}",token);


    let req = client.post("https://api.openai.com/v1/completions")
        .header("Authorization", auth_header)
        .json(&AskGPT {
            prompt: prompt_txt,
            model: String::from("text-davinci-003"),
            max_tokens: 4096,
            n: 1,
            stream: false,
            temperature: 0,
        }).send().await;

}

Finally, a Result!
But what do we do with it?

Well, we can just add ? to the end of the await, which would immediately give us the Response, but that's no fun, so we'll use probably one of my favorite rust features - the famous match.
match statements are at the core of the rust developer experience, providing you with powerful pattern matching abilities that ensure all the paths your code takes are covered.

But Ian, what is so special about it?
Isn't it just if/else on steroids?
Oh no, it's way more than that. Unlike a set of if/else or switch statements, match forces you to check all of the possibilities, ensuring you cover both the happy and the sad paths your code can take. Why is this so superpowered? Because it reduces the possibility of bugs due to unhandled cases and forces you to cover all possible cases, improving your code immediately. It's one of those rare tools that can both improve readability of your code, solve bugs and increase maintainability in one single swoop.

So let's try and use it - the syntax is simple, on the left hand side is the pattern you are matching against and on the right hand side is the codeblock to execute. First we'll check if the request actually happened by checking the Result we got.

    match req {
        Ok(response) => {
          //request actually happened, we can access response safely
        }
        Err(error) => {
            //TODO handle error
        }
    }

Now in our Ok branch, we can access our response object safely, knowing we got the error case covered too and it isn't gonna cause a runtime crash.
We can move on to check if the request has actually been a sucessful one by simply checking if the status code is 200 OK.

match response.status() {
    StatusCode::OK => {
      // smashing success 
    }
    other => {
      // TODO handle error
    }
}

And finally, for the main step - if the request was a success, we should try and deserialize the body into our Completion struct. But since that can fail too, we should do a quick match here too and extract the response from our completion object:

match response.json::().await {
    Ok(parsed) => {
        //We know there is always at least 1 item in choices 
        //due to our request param n==1 so we'll just live wild and unwrap
        let choices = parsed.choices.first().unwrap();
        let json: &str = choices.text.borrow();
        Ok(String::from(json))
    }
    Err(err) => {
            return Err(Parsing);
        }
}

Now, to handle the errors - let's add an enum that will denote the different types of errors we can have (yes, I'll condense all possible errors to these three types. What could go wrong..) - the connection error, the server response error and the parsing error. Hop up to the models.rs and add it:

#[derive(Debug)]
pub(crate) enum OpenAiError {
    Connection,
    Parsing,
    Server,
}

match req {
    Ok(response) => {
        match response.status() {
            StatusCode::OK => {
                match response.json::().await {
                    Ok(parsed) => {
                        //there is always at least 1 due to our request
                        let choices = parsed.choices.first().unwrap();
                        let json: &str = choices.text.borrow();
                        Ok(String::from(json))
                    }
                    Err(err) => Err(Parsing);
                }
            }
            other => Err(Server)          
        }
    }
    Err(err) => Err(Connection)

Congratulations! We've successfully made our request in a safe manner and covered all the sad and happy paths on the way.

So with our requests poppin', we can finally start working on our sort_recursively function. Why recursion here? Because we're basically reducing a list onto itself with GPT3 acting as our reducer function. While we could do a loop here and call this method n times, it would mean we would have to also mutate a variable outside of the loop (containing our categories). As that feels dirty, we'll do it the clean, functional way by using our good ol' friend, the recursion.

So let's open up our main.rsand start get into the sort_recursively function.

First, we'll build our prompt, then send it to prompt_open_ai and try to deserialize the response. If it succeeds, we join it with the existing categories and pass it again into sort_recursively with the remaining chunks, until we're out of chunks.

async fn sort_recursively(
                        sorted_categories: Vec,
                        remaining: Vec>,
                        client: Client) -> Result {

    let prompt = build_prompt(remaining.first().unwrap().to_vec(),
                             sorted_categories);
    let ai_response = prompt_open_ai(prompt, &client).await.unwrap();
    let json = ai_response.as_str();

    //try to deserialize it
    let generated = serde_json::from_str::(json);
    
    match ai_response_result {
        Ok(response) => {
           let parsed = serde_json::
                           from_str::(ai_response.as_str());
           match parsed {
               Ok(res) => match res {
                   Ok(wrapper) => {
                       let mut new_categories = wrapper
                                   .categories.to_owned();
                       //remove the processed chunk
                       let mut next_slice = remaining.to_owned();
                       next_slice.remove(0);
                       //join the categories
                       next_categories.append(&mut new_categories);
                       //if we're not done yet recurse
                       if next_slice.len() != 0 {
                        let next = sort_recursively(next_categories,
                                                    next_slice, 
                                                    client).await;
                        match next {
                            Ok(cats) => Ok(cats),
                            Err(e) => Err(String::from("Sort failed"))
                        }
                       } else {
                           Ok(Categories { categories: next_categories })
                       }
                   }
                   Err(msg) => Err(msg)
               }
               Err(parsing) => Err("Parsing response error".to_string())
           }
        }
        Err(err) => Err(err)
    }}

With all these matches, our code is starting to look pretty ugly.One way to avoid nested match hell is to use map,map_err and and_then extensions - they operate on either the left (map) or the right (map_err) side of the Result, enabling us to avoid nesting hell by simply chaining them into a more readable, concise version of it. The data will pass only through the corresponding operands so we can safely map our data and errors to the proper format.

We'll use it to reduce the first set of nested matches and we'll leave the last one as a match. Why? Because async closures still aren't stable in Rust it seems. We'll map all the errors into a Err(String) format so we can return it properly:

async fn sort_recursively(sorted_categories: Vec, remaining: Vec>, client: Client) -> Result {

    let mut next_categories = Vec::from(sorted_categories.deref());
    let prompt = build_prompt(remaining.first().unwrap().to_vec(),
                              sorted_categories);

    let ai_response_result = prompt_open_ai(prompt, &client).await;

    let res = ai_response_result
        .map_err(|e|
                format!("Error communicating with OpenAI - {:?}", e))
        .and_then(|ai_response|
            serde_json::from_str::(ai_response.as_str())
                .map_err(|_| "Parsing response error".to_string()));

    match res {
        Ok(wrapper) => {
            let mut new_categories = wrapper.categories.to_owned();
            //remove the processed chunk
            let mut next_slice = remaining.to_owned();
            next_slice.remove(0);
            //join the categories
            next_categories.append(&mut new_categories);
            //if we're not done yet recurse
            if next_slice.len() != 0 {
                sort_recursively(next_categories, 
                                next_slice,
                                client).await
                    .map_err(|e| 
                        format!("Sorting failed, reason: {}", e))
            } else {
                Ok(Categories { categories: next_categories })
            }
        }
        Err(msg) => Err(msg)
    }
}

There it is - we called the API in a safe, error free-oh-wait.... it's not compiling.

Well, one thing we didn't think about is async recursion.
Why is this such a problem?

Well, due to how async/await is implemented in Rust (and a lot of other languages), under the hood it generates a state machine type with all the futures in the method. But now that we are adding recursion here, the generated type starts referencing itself - so under the hood, it blows up into a potentially infinitely recursive type and compiler cannot determine the size of the type. To stop it from blowing up, we'll need to fix the recursion to return a Box'd Future, which will then just give us the pointer to the heap instead of the whole object, preventing infinite self-referencing under the hood.

I'd recommend reading more about this problem here and following this rabbit hole deeper and deeper - it covers a lot of language design questions and concepts which appear through many languages. But, for now, all we are going to do is use the async_recursion crate, so head on to your Cargo.toml and add it there:

[dependencies]
..
async-recursion = "1.0.2"

And mark your function with #[async_recursion] macro so it can Box it for you.

With that out of the way, we can come back to our original sort_items method and finally respond to that API request. Last time we left there, we added the Client instance, so just head down below it and call the sort_recursively method, use map_err to map the error into our ErrorResponse structure, wrap it in the JSON and return as a response and use map to turn our Ok result into a proper response:

    sort_recursively(categories, prompt_slices, client).await
        .map_err(|e| 
            (StatusCode::INTERNAL_SERVER_ERROR, 
            Json(ErrorResponse { message: e })).into_response())
        .map(|wrapper| {
            let new_categories = wrapper.categories.iter().map(|item| {
                CategoryWithItems {
                    category_id: item.category_id.to_owned(),
                    category_name: item.category_name.to_owned(),
                    items: item.items.to_owned(),
                }
            }).collect::>();
            (StatusCode::OK, Json(Categories {
                categories: new_categories
            })).into_response()
        })

And with this done, our service is now finished!

We take the response, format it, prompt it and give it back to user. Our plan is safe and sound. All that's left to do is deploy it - but we don't have to think about provisioning instances, setting up security groups or writing dockerfiles. Since we scaffolded our service via shuttle, we can easily deploy it with a simple touch of the terminal. Open up your projects folder in your shell of choice and type:

cargo shuttle deploy

Now stand up, take a few breaths, grab a sip of the coffee and before you even know it, your server is up and running at: https://projectname.shuttleapp.rs/

Now, uh... why were we even doing this?

Oh yeah, we were writing a JS extension. With our server up, it's nearly finished - just pop over to the extension and replace the localhost endpoint with the real endpoint you just got from shuttle.

Now, load the extension into a small window just to test it. Hit the sort button, wait for a bit and - BAM! Your tabs should be magically sorted into proper groups! Finally!

Let's try it in our real window - the one with ..uhh its nearing 600 tabs now. So we'll just hit the sort button and - wait...

...wait..

.....wait a bit more....

...... waaaaait it's coming...

.... this is taking way longer than 60 seconds...

... oh wait...

.. error?

Ooops - we hit the token limit!
Why? How? Didn't we do the whole chunking thing just so it fits?

Weeeeell, seems like we'll need to do a better calculation on prompt sizes.

Also, our recursion is causing problems - adding all previous categories to each prompt is causing it to blow up in size and it takes a really long time to actually finish the whole chain - way longer than 60 seconds.

And finally, the categories are quite... meh.

Which is great, since it gives us more stuff to do for the next iteration - we'll see how to eliminate this recursion, how to use GPT tokenizer and embed dictionary files into the binary and use shuttle's static folder service for it instead of blowing up our build times. We'll also take a stab at finetuning the model, giving us better results for less tokens - and since we're lazy, we'll just be generating the training data using GPT itself.

If you've come this far, thanks for reading and don't worry, we have many more feature creeps and potential problems to uncover on our path, so see you in the next episode of "Human vs Machines".

Machines might still have some problem understanding the concept of cuteness. "Cute rusty crab illustration" - 2023, the artist is a machine.

Sorting 400+ tabs in 60 seconds with JS, Rust & GPT-3: Part 1

Ian Rumac — Thu, 23 Feb 2023 10:43:00 GMT

I'm a serial tabbist. I admit it.

Currently, I have about 460 tabs open across 5 brave windows. Let's not even get started on the bookmarks.

"B-b-but, they're all necessary! So much knowledge! So many good links!"
- My inner hoarder

Yeah, I'm like an information hamster. I just keep hoarding all the tabs until I can find enough time to read everything - and open even more of them on the way. And as one can assume, having so many tabs can be quite overwhelming, either when I need to find something and it's lost beyond the borders of the tab bar or when I'm just looking at the screen and getting the anxious feeling of "having so much to do" - even when there is nothing to be done.

So, being the lazy hacker I am, instead of actually sorting them, cleaning them up
or *gulp* simply closing them all, I wondered - why not just let the machine do the job? Can I have a 1-click solution to all my woes? Can I Marie-Kondo my inner hoarder into submission by using code?

Luckily for us, there is a giant language model worth billions of dollars just waiting to eagerly do the job. The idea is simple: Give GPT3 a list of items and ask it to return a list of categories those items belong to. Wrap all that up into a chrome extension and let the magic happen.

So, let's crack our fingers and get coding.. or.. oh... wait..

The sweet taste of complexity

Let's backpedal a bit. So, our plan sounds simple enough. But as it usually goes in software, we missed out on some key details that are going to blow up our scope and budget if we don't think about them properly.

Some of the key issues to think about before we dive into code head first and find ourselves in a world of regret are:

Prompt token limits
OpenAI's language models have token limits - 2048 or 4096 tokens.
Since each token is about 4 characters, that limits our prompt and response size to 8192/16384 characters respectively.

There are a few ways we can get around this problem (we'll cover all of them):
- Cutting our prompt into consumable chunks
- Optimising the data sent to reduce token count
- Fine-tuning a model for our task
API Key security
Since OpenAI API charges API calls by tokens used, our API key needs to be hidden somewhere safe. Hardcoding it in our extension is a no-no - unless we really want to pay OpenAI millions of dollars in bills because some bored script kiddy decided to scrape our key.
User privacy
Tab titles and URL's can reveal sensitive things - private documents,
links, session ID's and a lot of data about a person. We want users to be able to trust the extension, so we want to open-source it, have it build and deploy from that source and make it easy to deploy for others.
Ease of update
Since LLM's can be fickle with their responses and OpenAI API could incur us insane usage costs due to simple mistakes, we want to have control over updates instead of letting the users do it at their whim. That means our most important code cannot reside in the extension.

How do we solve those issues?

We'll take a simple route - instead of writing all of the logic in the extension itself, we'll hide it behind an API - we'll build a simple backend service that will receive the tab data from the extension, chunk our prompts, communicate with OpenAI's API and reduce the data back into a single response. This enables us to both secure our keys, control our updates and open-source the extension without giving our secret token away.

To do this, we'll be using Rust - with Axum as our backend framework, Shuttle as our deployment platform and Github Actions as our CI.

So, before we get into code, let's do some napkin sketches to get an overview of what we're building:

(Not a real napkin - made with okso.app, an amazing whiteboarding app made by Oleksii Trekhleb)

Step 1: Building the Extension

Chromium extension are quite simple to build - they're basically just tiny webpages that live inside your browser and (with proper permissions) are given access to your browser by using your browser's API. We'll be relying on the Chrome API - it's the API Google Chrome uses - and which many Chromium project based browsers expose (such as Brave, which I'm using, and even Edge, tho with a different namespace). Other browsers, like Firefox or Safari aren't built off of the Chromium project, but provide a quite similar extension API. If you want to know more about the differences between them, I'd suggest this MDN article.

Specifically we'll be focusing on these two API's:

chrome.tabs - enables us to query tabs our user currently has opened
chrome.tabGroups - enables us to query existing groups, create new ones and move tabs inside them

So let's get to building. To bootstrap our extension, we'll be using Chrome extension CLI - it will generate the initial project structure we need.
So, hit the terminal with:

npm install -g chrome-extension-cli
chrome-extension-cli bookie-js
cd bookie-js

Follow the instructions at the end and load the build folder as an extension - it will allow you to load and test your extension via hot reload, so every change will be immediately visible.

Now, take a peek inside the structure it generated - most of it is self-explanatory,

├── README.md
├── config
│   ├── paths.js
│   ├── webpack.common.js
│   └── webpack.config.js
├── node_modules
├── package-lock.json
├── package.json
├── pbcopy
├── public
│   ├── icons
│   ├── manifest.json
│   └── popup.html
└── src
    ├── background.js
    ├── contentScript.js
    ├── popup.css
    └── popup.js

We're mostly interested in only three files for now:

public/manifest.json

The manifest is a JSON file which provides the browser with information about your extension, such as name, it's capabilities, how it's started, which file to display, scripts to run on pages and many more. A few fields to note there for us:

default_popup - the HTML file to show when the extension icon is clicked
permissions - we need them to access certain parts of Chrome API
host_permissions - a set of URL patterns your extension can access

For now, we'll leave it all as it is and come back to it later.

src/popup.html

The starting point of our UI. This HTML pops up when we click the extension button in the browser, so we'll use it to build a simple interface here.
We'll have a 'Sort' button that calls our API's /sort endpoint and returns the result, a loading bar and a simple error box in case anything goes wrong.
For debugging, we can also have a "Show tabs" button that will show as a list of all of our tabs. So let's write some simple HTML for it:



  
    
    Bookie JS

src/popup.js

This is where our JS will reside. We ain't gonna use no fancy bulletproof cybernetically CRISPR'd SSSR JavaScript framework, it's going to be our plain ol' vanilla JS. To update the UI, we will rely on a simple render(state) function that manipulates DOM elements using some simple show and hide functions (by changing element.style.display to block/none).

Now, let's write our thought process down by writing it into functions:

'use strict';

import './popup.css';

(function () {

const SORT_BTN = 'sortBtn';
const LOADING = 'loading';
const ERROR = 'error';
    
// get tabs & groups from the API
async function getTabsAndGroups(){};

// call backend with the data
async function callBackendToSort(tabsAndGroups){};

// apply result to browser
async function applySort(sortedCategories){};   

//runs our app   
async function run(){

 //get tabs
 let tabsAndGroups = await getTabsAndGroups();
 render({loading: false, error: null}

 let btn = document.getElementById('sortBtn')

 //on click, call the API, show loading and apply the results when done 
 btn.addEventListener('click',async ()=> {
     render({loading: true, error: null}
      try {
        let result = await callBackendToSort(tabsAndGroups)
        await applySort(result)
        render({loading: false, error: undefined})
      }catch (e){
        render({loading: false, error: e})
      }
 })
}

//load our run function when the content loads
document.addEventListener('DOMContentLoaded', run);
    
})();

Our first step will be querying the Chrome API for tabs and groups. As we can see in the docs, we can use chrome.tabs.query to achieve this.

So, let's try it:

async function getTabsAndGroups() {
    let chromeTabs = await chrome.tabs.query({})
    console.log(chromeTabs)
  }

Not working? Now, remember that public/manifest.json file? And the permissions object?

Well, to access tabs, their titles and groups, we'll need to add matching permissions to it. So open up the manifest.json and under permissions add "tabs", "tabGroups". Now when installing, chrome can check your extensions permissions and let the user know what you're accessing.
But, to be able to access the tabs API, we'll need one other special permission called host-permissions. It tells the user which websites the extension is enabled to run on, so if we want to be able to use it on all tabs we'll need to add the proper URL pattern. So add a new property to the manifest.json called host-permissions with a pattern allowing it to match all URL's such as "host_permissions": ["*://*/*"]. Finally, now we are able to access all of the user's tabs and groups.

Now that it's working, the data the chrome.tabs.query method returns will contain a few things we'll need: id, title and groupId. We'll be using id and title for sorting, and groupId to query existing groups, so first, we'll map the returned object to a simplified version of it, using only the properties we need.

To get more data about groups, we'll create tabsForGroups function which will find all the unique groups and query Chrome API by using chrome.tabGroups.get(id) to get the title of each group.

async function tabsToGroups(tabs){
  //get all existing groupIds from tabs
  let groupIds = tabs
      .map( (it)=>it.groupId)
      .filter((it)=>it!==null && it!==undefined && it!==-1);
  
  //push them into a set to get unique ones
  let groups = new Set(groupIds)

  //query chrome API for data about each tab group
  return await Promise.all([...groups]
      .map(async (it) => {
      let item = await chrome.tabGroups.get(it)
        return {
          id: item.id,
          title: item.title
        }
    }));
  }

// now our function can return us all of our tabs and groups
async function getTabsAndGroups() {
    let chromeTabs = await chrome.tabs.query({})
    let tabs = await mapTabs(chromeTabs)
    let tabsWithGroups = await tabsToGroups(tabs)
    let groups =  tabsWithGroups.filter((it)=>it.title.length !== 0);
    return {
      items: tabs,
      categories: groups
    }
  }

Boom, in a few simple steps we have the list of our existing groups and tabs.
The API calling function is also quite simple. Since our API doesn't exist yet,
we'll just write a generic POST request to localhost:

async function callBackendToSort(data){    
 return await fetch('http://127.0.0.1:8000/sort',{
      method: 'POST',
      headers: {'Content-Type': 'application/json'},
      body: JSON.stringify({
        items: data.items,
        categories: data.categories
      })
    })
}

Our render function is quite simple too - we just check the state and change our UI accordingly.

function render(state){
    if(state.loading){
      show(LOADING)
      hide(SORT_BTN)
      hide(ERROR)
    }else{
      hide(LOADING)
      show(SORT_BTN,true)
    }
    if(state.loading!==true &&
      (state.error!==undefined && state.error!=null)){
      show(ERROR)
      showError(state.error)
    }else
      hide(ERROR)
}

All that's now left to do is implement the applySort function which will apply our new categories to the browser itself.

The idea is:

Check if the group exists
If it doesnt, create it
Update it's tabs list and title

For this, we have a bit of API research to do - the documentation covering this part is a bit confusing. You'd expect to be able to have something like
chrome.tabGroups.create or chrome.tabGroups.update which would change tabs in the group, but... that's naive thinking.

To create a group we use the API call chrome.tabs.group by NOT passing the chrome.tabs.group a groupId. Then, the group will be created and the new groupId returned to you. This is kind of a weird call by the chrome team - if groups are just containers of tabs, why would tabs have knowledge and control over them?

Shouldn't the groups be created and managed via groups API?

Oh also, if you want to add tabs to the group, you use the same call and pass it the array of tabs via tabIds. "Hey can I pass in the title too since we're already creating and updating the object via this API call?" No, for that you'll use chrome.tabGroups.update API call.

I assumed this weird syntax is because groups were a later addon in chrome so support was retrofitted into the tabs API itself. So let's test that assumption. Looking at the commit that added groups to the Tabs API, we can find the same discussion in the comments, leading us to the Tab Group API proposal. It seems the team decided to split the responsibilities between tab management and group management. Since moving a tab is tab management, it's responsibility belongs in the Tabs API.

The alternative proposal was also discussed (putting that responsibility in the TabGroups API), along with it's pros and cons:

From my perspective (as the user of the API), the cons list doesn't seem that bad. Tabs wouldn't need to know about groups, user security would be increased (extensions would only need tabGroups permission, reducing the potential area for malicious abuse by extensions) and it would hide the implementation details, replacing them with an intuitive API, which is what abstractions are all about. Weird decision none the less.

But enough talking about the spaghetti, let's write some down.

function applySort(sortedCategories){

/* The response object we want looks like: 
{ categories: [
	{ category_id: int, category_title: string, items: [int] }
    ] }
*/


  for (i = 0; i < sortedCategories.categories.length; i++) {
     let category = sortedCategories.categories[i]
     let categoryId = category.category_id
     //check if the group with ID exists
     let groupExists = await chrome.tabGroups.get(categoryId)
     					.catch((e)=>undefined);
      let groupId;
      if(groupExists === undefined)
         //if it doesnt, the chrome.tabs.group returns us an ID
         groupId = await chrome.tabs.group({ tabIds: category.items });
      else {
          
        //if it does, we use the existing one
       	groupId = groupExists.id
        await chrome.tabs.group({groupId: groupId,
                                tabIds: category.items});
      }

      // Set the title of all groups and collapse them
      await chrome.tabGroups.update(groupId, {
        collapsed: true,
        title: category.title
      });

     
  })
}

With this, our JS extension MVP is done.
- We collect the tabs and groups
- We send them to the API
- We apply the returned sort.

Now, we don't have an API yet, so how do we test it?
We should write down some unit tests, but let's leave that for another day (no really - a few posts down we'll look into testing a chrome extension with Jest). For now, we can fake the return of callBackendToSort function to include a few categories and a few tab id's - something like this (but with your tab id's):

{
	"categories": [{
		"category_id": 837293848,
		"category_name": "Hacker News",
		"items": [1322973609, 1322973620]
	}, {
		"category_id": 837293850,
		"category_name": "Science",
		"items": [1322973618, 1322973617, 1322973608]
	}, {
		"category_id": 837293851,
		"category_name": "GitHub",
		"items": [1322973619]
	}, {
		"category_id": 837293852,
		"category_name": "Web Development",
		"items": [1322973612, 1322973613, 1322973615, 1322973616]
	}, {
		"category_id": 837293853,
		"category_name": "Web APIs",
		"items": [1322973646]
	}]
}

Now we can move on to the fun parts - building that API, prompt optimisations, GPT timeouts and fixing mistakes we'll make in the days of the future past.
Oh and we'll also be adding some more complexity and feature creep, but more on that later.

Stay tuned for Part 2 where we'll continue our adventure with everyone's favourite crab - Rust.

Rusty the crab cute illustration, simple, clean, 2022 (The Artist Is A Machine)

The actual value behind GPT isn’t in writing SEO spam - it’s the transformers.

Ian Rumac — Mon, 06 Feb 2023 22:46:51 GMT

(Note: I talk about GPT here mostly but just because it's easier to write than "transformer language models" and most people are familiar with them in the form of GPT, but the text is about them in general)

GPT3, often confused with ChatGPT in the latest swarm of internet articles,
has been all the rage in the tech buzzword world these days. It's treatment in the media for the last year or so has been off the charts, with some treating it as the miracle AI we have been waiting for. Everybody and their mom has been jumping on the bandwagon, creating the next copywriting tool, making it pass the bar or just using it to write their math homework.

Unfortunately, the quality of the content generated is usually mediocre - even with better prompting, the text generated cannot be novel - the technology itself is based on "common denominators" in a way, parroting and remixing from the trained texts, so you can forget about becoming the next James Joyce in a few clicks; your writing will most likely end up looking like an average philosophy student's grandiose manifesto, with a bunch of words thrown in to impress the average reader, yet meaning nothing and bearing no satisfaction to the reader's gaze.

But, far off on the other side, there are some way more fun applications people are finding uses for - GPT3 as a reducer, as a backend, as a translator or decompiler/deobfuscator- and these applications have a much bigger practical value.

And for the last year or so, this has been tickling my mind - what are some actual usecases behind the technology - yes, generating articles or parroting back documentation is an obvious one. Fine-tuned models answering support questions is also a nice one, tho it comes with it's own 13 reasons why not.

But the transformations themselves - taking data in 1 form and returning it in the other, processing it along the way or just translating it - unlock a large pool of uncaptured value.

Imagine being able to process a bunch of scraped or human data into a predefined format that aligns with your API's data format - or to put it more vividly, imagine your grandma sending a text "can you bring me 2 bottles of milk and a pack of eggs?", getting an answer "that will be 3.97, is that ok?" and someone showing up with 2 milk and eggs 15 minutes later (or sometimes 12 milks and 2 eggs because the model screwed up).

Behind the scenes, the text is actually fed into the model that transforms it into a json in the format of:

{
  "action": "purchase",
  "items": {
    {
      "name": "Milk",
      "quantity": 2
    },
    { 
    "name": "Egg pack",
    "quantity": 1
    }
  }
}

Which the latest 15-minute grocery delivery app can then consume and bring your grandma her milk (and rip her off for a 4$ service fee, 8$ delivery fee, 3$ VC fee on the way).

Even better things are possible with chaining different models:

Scrape a website, feed it into a model to remove unnecessary HTML, and feed the results into another model that transforms contents into a format your API's consume. Hell, why even bother with an API, just insert the results into a model that is fine-tuned in translating to SQL queries and pump that sweet data oil in directly.

Want to check how much open bugs during full moons influence your user churn?
Well what if your favorite analytics tool had a question box connecting to a chain -
first giving your question to a model that suggests data to find, passing into another model returning a query on your data lake which is then evaluated for safety, executed and passed together with the original prompt into a code-generating model that will return the necessary HTML to display that data.

Instead of having to torture your developers and designers with supporting infinite possible permutations of filters, chart designs and customisations, you can just leave it up to the model to generate them on the fly.

With enough fine-tuning (and a lot of human work to provide good data),
transformer LLM's can help us achieve a lot of stuff that we thought "unscalable" as of now - stuff that wasn't cost efficient, needed a mechanical turk or a large swath of harcoded assumptions to iron out the edge cases - can be achieved by using an oversized text mumbler-jumber.

And yes, there are a lot of hallucinations, quite a few mistakes, and a lot of accuracy issues in the way - one wrong word and the model could go wind up in the crazy lane - but I'm not saying it's a perfect "do-all-be-all" technology, far from it - I'm saying it's a great "glue" layer we were missing in our toolbelt, a "generic glue" layer which could help us unlock more economic and data value than ever. With good training, error checking and proper chaining, we could conquer some problems that were unsurmountable until now.

Even though the current generation of models are like giant mainframes upon which we can only gaze with wonder, there are newer and smaller models coming out at a rapid pace. And while we are still quite far away from having a small, easily tuneable model that will be good enough to cover a large swath of tasks with only a small amount of additional training, the next generation of programmers might grow up complaining that 'gpt install is-integer' ruined programming.