Chat-Driven Development with the JetBrains AI Part 3: Hitting a Groove

This is the third out of five posts that follow my first attempt using the JetBrains AI Assistant (the JBAI) for generating code and tests, in a practice that I'm calling Chat-Driven Development. Which, even I'm starting to wonder if that's too cheesy. But I'm too far along now to change it, so maybe the marketing folks will decide on something better.

A big thanks to Denise Yu for letting me use her Gophermon drawings for this blog series!

The full list of posts in this series:

Getting Started (or, Forming)
Early Stumbles and Small Wins (or, Storming)
Hitting a Groove (or, Norming)
Connecting the Pieces (or, Performing)
Lessons Learned (or, Adjourning)

Series tl;dr: I was able to chat with the JBAI, so that it created a unit test for a shell function, and then asked it to implement that function. And it freaking worked.

This post is very long, but there are a lot of lessons learned as a result.

This post's tl;dr: Lessons learned:

The JBAI is not currently aware of what's happening in adjacent packages.
The simpler stuff is currently easier and faster to just do by hand. (Something I learned earlier, but apparently had to re-learn.)
The JBAI thinks of itself and its output as a scratch pad.
Despite the scratchpad mentality, the JBAI is well aware of what it's doing in its execution context, and can adjust as needed.
Each execution is treated as a one-off, with further specification running in the same context.
Modifying the existing execution context has plenty of room for improvement.
The JBAI doesn't know that it has to reset the demo data at the end of each test spec.

Just want to see the code? The code for this part of the series lives on a branch in this GitHub repo.

Getting Set Up

This post follows the build-out of something more complex - the users data service that REST handlers would eventually call.

This requires some setup. The first thing I asked it to do was to mock out some data.

I created two new packages in my module:

a models package, which would eventually define a User struct
a services package, which would hold some mock user data, and would be called by the REST handler functions that I wanted to have the JBAI build out for me.

Can the JBAI build a struct based on a prompt? In /models/user_model.go I asked the following of the Generate Code prompt:

Me > Generate Code: Please define a new struct for me called User. It should have three fields. The first is called Id, and is a string. The second is called Name, and is a string. The third is called Age, and is an int8.

The result:

type User struct {
    Id   string
    Name string
    Age  int8
}

Exactly as I expected. I'll admit, it took way longer to type the prompt than it would have to create the struct using the built-in autocomplete. But we're doing this for science!

In /services/users_service.go, I asked:

Me > Generate Code: Please generate a private variable called users. It should be a map[string]models.User, and should have four users defined. The first has a key of "1", and has the following values: "1", "Mario", 38. For key "2", give the following values: "2", "Luigi", 35. For key "3", give the following values: "3", "Peach", 37. For key "4", give the following values: "4", "Toad", 73.

It generated the values pretty well, but there was a problem:

package services

import "your_project_name/models"

var users = map[string]models.User{
    "1": models.User{ID: "1", Name: "Mario", Age: 38},
    "2": models.User{ID: "2", Name: "Luigi", Age: 35},
    "3": models.User{ID: "3", Name: "Peach", Age: 37},
    "4": models.User{ID: "4", Name: "Toad", Age: 73},
}

It's subtle, but a clear case in point of this conclusion is the fact that the demo User entries have an ID field, rather than the Id field that the JBAI declared in models.User. That leads me to believe it inferred that the first value was an ID, which, I'm back and forth on whether or not that's good.

I was starting to see a pattern. Even though Goland as an IDE is phenomenal at indexing and building intelligence around code in other files, as well as understanding module URLs and package paths, the JBAI is not currently aware of what's happening in adjacent files and packages.

Some quick refactoring led me to a clean implementation.

So far, this has not saved me time. I could have written everything in these last two files much faster than the time it took to write out the prompts and then clean up the code. This leads me to my next conclusion about using an AI assistant to generate code: the simpler stuff is currently easier and faster to just do by hand.

Generating Data Service Methods from Tests

Since I now know that the JBAI isn't very smart about code happening in adjacent files, I'll assume that my earlier success with registerHealthEndpoint was based on the fact that I was overly specific about what needed to happen in both the test file, and the implementation. At the same time, the JBAI was smart enough to know to call registerHealthEndpoint in the Start function, so within a single file it was very clear on the implementation context.

This made me wonder if I would have more success by asking the JBAI to do everything in the same file, and then use the IDE's Refactor context sub-menu to move the code around (or copy/paste/edit like the old man that I am).

Everything that follows happened in /services/users_service.go, via the right-click context menu, and its AI Actions sub-menu.

GetUsers

Me > Generate Code: Please create a test for a new function called GetUsers. The test should fail if it returns a slice of users that does not have exactly four entries.

The result:

// Assuming GetUsers function is in this form
func GetUsers() []models.User {
    userList := make([]models.User, 0, len(users))
    for _, user := range users {
        userList = append(userList, user)
    }
    return userList
}

func TestGetUsers(t *testing.T) {
    users := GetUsers()
    if len(users) != 4 {
        t.Errorf("GetUsers() failed, expected length of 4, got %d", len(users))
    }
}

Bangarang, Rufio. It's worth pointing out that there is a users variable defined in the test that shadows the users package variable. In fact, the IDE pointed that out after the JBAI generated that code. This clearly indicates that the JBAI isn't checking with the IDE's built-in code improvements/suggestions functionality before it outputs generated code. This wasn't a hard nor time consuming change, and the code would have worked just fine if I hadn't refactored it.

GetUserById

Me > Generate Code: Please generate two test for a new function called GetUserById. The function should take a string argument and return a models.User struct. The first test should ensure that if "1" is passed in as an argument, the result matches the value of the "1" key in the users map. The second should ensure that if "0" is passed in as an argument, the result is a nil models.User struct.

Now we're getting somewhere! A bunch of stuff happened:

it generated the function as expected
it generated the tests, as expected
it refactored the users demo data to use pointers - map[string]*models.User - I'd imagine for simpler testing
it refactored the GetUsers to address the fact that the map values were pointers

Here's the resulting file content for /services/users_service.go:

package services

var users = map[string]*models.User{
    "1": &models.User{Id: "1", Name: "Mario", Age: 38},
    "2": &models.User{Id: "2", Name: "Luigi", Age: 35},
    "3": &models.User{Id: "3", Name: "Peach", Age: 37},
    "4": &models.User{Id: "4", Name: "Toad", Age: 73},
}

func GetUserById(id string) *models.User {
    return users[id]
}

func TestGetUserById(t *testing.T) {

    t.Run("returns the correct user", func(t *testing.T) {

        user := GetUserById("1")

        if user != users["1"] {
            t.Errorf("GetUserById() failed, got %v, want %v", user, users["1"])
        }
    })

    t.Run("returns nil for non existent user", func(t *testing.T) {

        user := GetUserById("0")

        if user != nil {
            t.Errorf("GetUserById() failed, expected a nil, got: %v", user)
        }
    })
}

func GetUsers() []models.User {
    userList := make([]models.User, 0, len(users))
    for _, user := range users {
        userList = append(userList, *user)
    }
    return userList
}

// TestGetUsers stayed the same...

We're getting into the realm of positive returns on time and energy saved. Of course, now it's time to attempt user creation, with some validation and error responses.

CreateUser

Me > Generate Code: Please create three tests for a new function called CreateUser, which takes a models.User as an argument, and returns an error. The first test should ensure that if a models.User is passed in with an Id value that matches one of the existing entries in the users map, CreateUser returns an error indicating that the Name must be unique. The second should test that CreateUser returns an error if the models.User passed in has a Name that matches one of the existing entries in the users map. The third should ensure that CreateUser adds the new models.User to the users map, with a key matching the new models.User Id field.

As I watched it generate the code, a strange thing happened. the diff preview seemed to delete all of the existing code; but at the same time, a new comment was added to the top of the new code that was generated:

// Rest of the existing code

When I accepted the diff, the new code entirely replaced the old code, and left that comment there indicating where the old code was supposed to go!!! I can't imagine why it did this, and I"m glad I had copied the entire file before accepting the diff.

This leads me to my next conclusion - the JBAI thinks of itself and its output as a scratch pad.

If that's correct, then this code isn't meant to be generated directly in my production files. It's meant to be generated in a scratch file, and then moved to the destination once I'm happy with it.

For science, I decided to try it again, and this time got a different result. The generated code was mostly the same, but this time it was added to the top of the existing file, even above the demo data, with the following comments above the generated code:

// New code starts here

and...

// New code ends here

This fits with the scratchpad mentality theory. Anectodally, it also removed the testing import, which is strange because it uses the testing package for its tests; the IDE then re-added it when I accepted the diff.

Here's the resulting code:

// New code starts here
func CreateUser(user *models.User) error {
    _, idExists := users[user.Id]
    for _, u := range users {
        if u.Name == user.Name {
            return ErrorUserNameExists
        }
    }

    if idExists {
        return ErrorUserExists
    }

    users[user.Id] = user
    return nil
}

func TestCreateUser(t *testing.T) {

    t.Run("returns error when user id exists", func(t *testing.T) {
        user := models.User{Id: "1", Name: "Bowser", Age: 40}
        err := CreateUser(&user)
        if err != ErrorUserExists {
            t.Errorf("Expected error %q, but got %q", ErrorUserExists, err)
        }
    })

    t.Run("returns error when user name exists", func(t *testing.T) {
        user := models.User{Id: "5", Name: "Mario", Age: 40}
        err := CreateUser(&user)
        if err != ErrorUserNameExists {
            t.Errorf("Expected error %q, but got %q", ErrorUserNameExists, err)
        }
    })

    t.Run("adds new user to map", func(t *testing.T) {
        user := models.User{Id: "5", Name: "Bowser", Age: 40}
        err := CreateUser(&user)
        if err != nil {
            t.Errorf("Expected no error, but got %q", err)
        }

        if users[user.Id] != &user {
            t.Errorf("User was not added to map")
        }
    })
}

// New code ends here

It's worth pointing out that, at this point, I hadn't actually tested any of this - I hadn't run the unit tests, I hadn't run any code that implements these functions, I'd only been giving them a glance over as I move along. I expected some things to go wrong once I got to that point.

However, I'm already looking at significant time savings. With just the three service methods and their respecive tests, the JBAI has generated about 200 lines of code. total time spent doing that was about five minutes, give or take some experimentation and cleanup.

UpsertUser

Me > Generate Code: Please create tests for a new function called UpsertUser, which takes a models.User as an argument, and returns an error. First, please test that UpsertUser returns an error if any of the fields are nil. Then, please test that if the Id of the models.User that's passed in as an argument matches the key for an existing entry in the users map, the entry is then updated with the new values of the passed in models.User. Finally, please test that if the Id is not found within the users map keys, a new entry is created with the Id used as the key, if and only if the Id and Name are unique.

That's a lot to ask. Think it worked?

Mostly. But once again, it replaced existing code - specifically the CreateUser function, and repurposed most of TestCreateUser as part of TestUpsertUser.

This time, instead of accepting the diff, I used the Specify option to further tweak the result:

Me > Generate Code > Specify: Please don't remove any existing code.

Success! The modified output left all of the existing code in place, and added only new code. Next conclusion: despite the scratchpad mentality, the JBAI is well aware of what it's doing in its execution context, and can adjust as needed.

This is the new code that was generated:

var ErrorUserFieldNil = errors.New("user fields cannot be nil")

func UpsertUser(user *models.User) error {
    if user == nil || user.Id == "" || user.Name == "" {
        return ErrorUserFieldNil
    }

    // Check for uniqueness of Name and Id
    for id, u := range users {
        if u.Name == user.Name && id != user.Id {
            return ErrorUserNameExists
        }
    }

    // Upsert operation
    users[user.Id] = user
    return nil
}

// Testing the UpsertUser function
func TestUpsertUser(t *testing.T) {

    t.Run("returns error when user fields are nil", func(t *testing.T) {
        user := models.User{Id: "", Name: "", Age: 0}
        err := UpsertUser(&user)
        if err != ErrorUserFieldNil {
            t.Errorf("Expected error %q, but got %q", ErrorUserFieldNil, err)
        }
    })

    t.Run("updates existing user in map", func(t *testing.T) {
        user := models.User{Id: "1", Name: "Steve", Age: 42}
        err := UpsertUser(&user)
        if err != nil {
            t.Errorf("Expected no error, but got %q", err)
        }

        if users[user.Id].Name != "Steve" {
            t.Errorf("User was not updated in map")
        }
    })

    t.Run("adds new user to map", func(t *testing.T) {
        user := models.User{Id: "5", Name: "Robin", Age: 30}
        err := UpsertUser(&user)
        if err != nil {
            t.Errorf("Expected no error, but got %q", err)
        }

        if users[user.Id] != &user {
            t.Errorf("User was not added to map")
        }
    })
}

54 lines in less than a minute ain't bad.

However - next conclusion - each execution is treated as a one-off, with further specification running in the same context. Previous Generate Code prompts are neither considered nor tracked.

Note: the uniqueness check happens for both creation and update, which isn't a valid approach, so I had to manually adjust that:

if _, exists := users[user.Id]; !exists {
    for id, u := range users {
        if u.Name == user.Name && id != user.Id {
            return ErrorUserNameExists
        }
    }
}

Since there's no test for this condition, I'll have to add one manually. Not hard, mostly copy/paste.

DeleteUserById

Me > Generate Code: Please write tests for a DeleteUserById function, which takes a string argument called "id", and returns an error. The first test should ensure that if no key matching the "id" string is found in the users map, the DeleteUserById function returns an error indicating that the user was not found. If there is a key in the users map matching the "id" string, the DeleteUserById function should return nil.

Once again reinforcing the scratchpad conclusion, the resulting output obliterated the previously generated UpsertUser function and tests. Some of it was replaced with the following comment:

// existing code

While other parts were replaced with test or implementation code for DeleteUserById.

I tried repeating the same Specify prompt to discourage removing existing code, and it effectively created the same exact result, with the existing code still removed. Conclusion: modifying the existing execution context has plenty of room for improvement.

To get around this, I copied the resulting code from the side-by-side diff, and pasted it into /services/users_service.go:

var ErrorUserNotFound = errors.New("user not found")

func DeleteUserById(id string) error {
    _, ok := users[id]
    if !ok {
        return ErrorUserNotFound
    }
    delete(users, id)
    return nil
}

func TestDeleteUserById(t *testing.T) {

    t.Run("returns error when user not found", func(t *testing.T) {
        err := DeleteUserById("nonexistent")
        if err != ErrorUserNotFound {
            t.Errorf("Expected error %q, but got %q", ErrorUserNotFound, err)
        }
    })

    t.Run("deletes user by id", func(t *testing.T) {
        err := DeleteUserById("1")
        if err != nil {
            t.Errorf("Expected no error, but got %q", err)
        }

        if _, exists := users["1"]; exists {
            t.Errorf("User was not deleted")
        }
    })
}

In wrapping up this part of the project, I had a lot of refactoring to do. I had to move all of the tests into /services/users_service_test.go, I had to run and correct the tests as needed, and I had to review everything as I would any other code review.

That really only added a few minutes to the overall effort. Considering that I was able to create a fully operational data manipulation service via chat - pretty awesome.

Running Tests

After moving all of the tests into /services/users_service_tests.go, I ran the entire test file, and only 3 out of 11 specs failed. That's not bad, in theory.

The tests that failed, did so because previous specs had modified the underlying data. That is, the following test:

func TestUpsertUser(t *testing.T) {

    // other specs in the suite...

    t.Run("updates existing user in map", func(t *testing.T) {
        user := models.User{Id: "1", Name: "Steve", Age: 42}
        err := UpsertUser(&user)
        if err != nil {
            t.Errorf("Expected no error, but got %q", err)
        }

        if users[user.Id].Name != "Steve" {
            t.Errorf("User was not updated in map")
        }
    })
}

made this test fail:

func TestCreateUser(t *testing.T) {

    // other specs in the suite...

    t.Run("returns error when user name exists", func(t *testing.T) {
        user := models.User{Id: "5", Name: "Mario", Age: 40}
        err := CreateUser(&user)
        if err != ErrorUserNameExists {
            t.Errorf("Expected error %q, but got %q", ErrorUserNameExists, err)
        }
    })
}

If the TestUpsertUser suite hadn't run before TestCreateUser, the results would have been different.

Next conclusion: the JBAI doesn't know that it has to reset the demo data at the end of each test spec, so as to test the each spec in isolation.

This wasn't a hard manual fix. I just had to make sure to reset the test data as the first step in each spec.

// at the top of my test file
func mockUsers() {
    users = map[string]*models.User{
        "1": &models.User{Id: "1", Name: "Mario", Age: 38},
        "2": &models.User{Id: "2", Name: "Luigi", Age: 35},
        "3": &models.User{Id: "3", Name: "Peach", Age: 37},
        "4": &models.User{Id: "4", Name: "Toad", Age: 73},
    }
}

// and then at the beginning of every spec
func TestUpsertUser(t *testing.T) {
    t.Run("updates existing user in map", func(t *testing.T) {
     mockUsers()
     // the rest of the spec code
    }
}

At first, I thought I could chalk this up to each individual run context being executed in isolation. However, this bad practice has the power to impact specs created within the same suite, which means the JBAI isn't applying best practices as it writes tests.

That's bad, m'kay?

From a big picture perspective, this means I can't rely on the JBAI to use testing best practices. I will have to come in and clean up the tests once they're written to ensure that they're effective.

As soon as I made the change shown above, all 11 test specs passed. As expected, I had to add another spec to cover the condition that I manually added in UpsertUser:

func TestUpsertUser(t *testing.T) {

    // other specs in the suite...

    t.Run("returns error when new user's name is not unique", func(t *testing.T) {
        mockUsers()
        user := models.User{Id: "23", Name: "Mario", Age: 2}
        err := UpsertUser(&user)
        if err != ErrorUserNameExists {
            t.Errorf("Expected error %q, but got %q", ErrorUserNameExists, err)
        }
    })
}

What's Next?

This was a super long post, but the actual work put in with the JBAI was pretty minimal. Like maybe a total of 15 minutes of actually settin up, writing prompts, drinking coffee, and reviewing/tweaking code.

What worried me as I finished working on this was that the user service was the easier of the two remaining chunks of work. Implementing the user service funtions in the REST handlers that still had yet to be written - that seemed daunting and I wasn't entirely sure how it would go.

Find out in the next part. (Hint - it went very well.)

Chat-Driven Development, Part 3: Hitting a Groove

Or, "Norming"

Table of contents