Chat-Driven Development, Part 4: Connecting the Pieces

Chat-Driven Development, Part 4: Connecting the Pieces

Or, "Performing"

This is the fourth part in a blog post series about what I'm calling Chat-Driven Development, in which I prompt an AI Assistant to generate code by describing the behavior of the unit tests for the desired code. In this series, the AI Assistant in question is the JetBrains AI (the JBAI).

A big thanks to Denise Yu for letting me use her Gophermon drawings for this blog series!

The full list of posts in this series:

  1. Getting Started (or, Forming)

  2. Early Stumbles and Small Wins (or, Storming)

  3. Hitting a Groove (or, Norming)

  4. Connecting the Pieces (or, Performing)

  5. Lessons Learned (or, Adjourning)

Series tl;dr: I was able to chat with the JBAI, so that it created a unit test for a shell function, and then asked it to implement that function. And it freaking worked.

This post, like the previous one, is very long. The challenge here was to see if I could generate a package full of REST handler functions that conform to an open-source RESTful library (Gin), and implement service methods from an adjacent package in the codebase, keeping separation of concerns in mind. That's a lot for an AI Assistant, and a big challenge for the guy writing those prompts. Which was me. I have a headache.

This post's tl;dr: Lessons learned:

  • Even when the JBAI is working with a declared type, it might change that type in what it looks at as required refactoring.

Just want to see the code? The code for this part of the series lives on a branch in this GitHub repo.

Generating REST Handlers

After learning about the scratch pad mentality, and the absolute mutilation of existing code by the JBAI, I thought it would be a good idea to set up the destination files that my code will eventually live in, and use Goland's scratch file functionality for interacting with the JBAI code generator.

Evereything below happened in a scratch file and was progressively copied either into /endpoints/user_endpoints.go, or /endpoints/user_endpoints_test.go, as I went.

Setting Up

At the top of my scratch file, I manually set up some indirect local references to the exported functions in the services package.

var (
    // imported funtions
    getUsers func() []models.User = services.GetUsers
    getUserById func(id string) *models.User = services.GetUserById
    createUser func(user *models.User) error = services.CreateUser
    upsertUser func(user *models.User) error = services.UpsertUser
    deleteUserById func(id string) error = services.DeleteUserById
    // imported errors
    errorUserExists error = services.ErrorUserExists
    errorUserNameExists error = services.ErrorUserNameExists
    errorUserFieldNil error = services.ErrorUserFieldNil
    errorUserNotFound error = services.ErrorUserNotFound
)

I did this for two reasons:

  • so the JBAI would have file-local context to operate on when it generates code

  • so that I can ask the JBAI to generate mocks for each of the functions

As I mentioned earlier, the overtly simple stuff that includes references to other packages in the module is best done by hand. It's faster and easier than trying to explain what you're looking for in plain text, and then correcting the JBAI when it gets it wrong.

I did, however, ask it to create mocks for each of the functions in the file, so that it could achieve complete coverage of the code that I eventually want it to generate and test:

Me > Generate Code: For each of the following variables, please create a facrory function that returns a mock function, taking an error as an argument: createUser, upsertUser, deleteUserById. The returned function should return nil if the factory function's error argument is nil, and returning an error if the argument is not nil.

For whatever reason, it added the mock factory functions to the var list:

var (
    // the previous variables listed above...
    // ...

    // factory functions for mocks
    createUserMock func(err error) func(user *models.User) error = func(err error) func(user *models.User) error {
        return func(user *models.User) error {
            return err
        }
    }

    upsertUserMock func(err error) func(user *models.User) error = func(err error) func(user *models.User) error {
        return func(user *models.User) error {
            return err
        }
    }

    deleteUserByIdMock func(err error) func(id string) error = func(err error) func(id string) error {
        return func(id string) error {
            return err
        }
    }
)

Me > Generate Code: Please create factory functions for mocking getUsers and getUserById. The factory functions should take a boolean as an argument. If a value of false is passed to the factory function, the mock should return a nil value. If a value of true is passed to the factory function, then the returned getUsers function should return a []models.User value with two entries that have populated fields with dummy data, and the returned getUserById function should return a *models.User value that has populated fields with dummy data.

Again, it added these factory functions to the var list. It also populated the models.User struct with invalid ID and Email fields. This is probably to be expected, since the definition for that struct is in a different file.

var (
    // the previous variables listed above...
    // ...

    getUsersMock func(success bool) func() []models.User = func(success bool) func() []models.User {
        if success {
            return func() []models.User {
                return []models.User{
                    {ID: "1", Name: "Dummy User 1", Email: "dummy1@test.com"},
                    {ID: "2", Name: "Dummy User 2", Email: "dummy2@test.com"},
                }
            }
        } else {
            return func() []models.User {
                return nil
            }
        }
    }

    getUserByIdMock func(success bool) func(id string) *models.User = func(success bool) func(id string) *models.User {
        if success {
            return func(id string) *models.User {
                return &models.User{ID: id, Name: "Dummy User " + id, Email: id + "@test.com"}
            }
        } else {
            return func(id string) *models.User {
                return nil
            }
        }
    }
)

I had to correct that, replacing ID with Id, and Email with Age.

I also had to try a couple of times to get this right, specifically making it clear that the mocks needed to return a function, and not a struct. Next conclusion: even when the JBAI is working with a declared type, it might change that type in what it looks at as required refactoring.

It's debatable whether or not I should have written those by hand; but this is for science, so I'll say it was worth it. Either way, when all is said and done, those are some fine looking factory functions.

getUsersHandler

The first REST handler I implemented was the GET /users handler, which I appropriately named getUsersHandler. Note that it's not exported; that's because I'll write a registration function that will wire all of the handlers into the Gin engine at the end.

Similar to the previous package, I decided to describe the tests, and have the JBAI implement the code as a result. I also had to make it clear that the handler should call the service functions, and handle each case appropriately.

Me > Generate Code: Please write two tests for a function called getUsersHandler. The getUsersHandler function should take a gin.Context as an argument, and not return anything. Internally, it should call the local getUsers variable function. It should write the result as JSON to the gin.Context. Whether the list returned by getUsers has multiple entries, or no entries, the getUsersHandler should always return a JSON array. Each test needs to use the getUsersMock factory function to mock the behavior of getUsers. One test should ensure that we test multiple entries being returned, and another should test an empty slice being returned.

That was a long prompt, and while I was writing it I wondered how much time I was really saving myself.

As it were, the resulting output replaced most of the existing content of the scratch file, including the mock factory functions that the resulting tests call internally. I had to copy the diff and paste it into different files, so that my next prompt could leverage those mock factories.

Here's the resulting code, before it was split up into its destination files:

func getUsersHandler(c *gin.Context) {
    users := getUsers()
    c.JSON(http.StatusOK, users)
}

func TestGetUsersHandler_WithMultipleEntries(t *testing.T) {
    getUsers = getUsersMock(true)

    // Prepare HTTP route with gin context.
    router := gin.Default()
    router.GET("/users", getUsersHandler)
    httpResponse := performGetRequest(router, "/users")

    // Check for correct http response.
    if httpResponse.Code != http.StatusOK {
        t.Fatalf("Expected the http response code to be %v, but got %v", http.StatusOK, httpResponse.Code)
    }
}

func TestGetUsersHandler_WithNoEntries(t *testing.T) {
    getUsers = getUsersMock(false)

    // Prepare HTTP route with gin context.
    router := gin.Default()
    router.GET("/users", getUsersHandler)
    httpResponse := performGetRequest(router, "/users")

    // Check for correct http response.
    if httpResponse.Code != http.StatusOK {
        t.Fatalf("Expected the http response code to be %v, but got %v", http.StatusOK, httpResponse.Code)
    }
}

// Utility function to perform a HTTP GET request.
func performGetRequest(r http.Handler, route string) *httptest.ResponseRecorder {
    req, _ := http.NewRequest("GET", route, nil)
    responseRecorder := httptest.NewRecorder()
    r.ServeHTTP(responseRecorder, req)
    return responseRecorder
}

getUserByIdHandler

The question going into this one is whether or not the JBAI could understand the need to - and then how to - extract the user ID from the path parameter in the GET request.

Less of a challenge, but still needed to be explained, is that it had to make the handler respond 404 if the service.GetUser call returned a nil value.

Me > Generate Code: Please create two test for a new function called getUserByIdHandler, which takes a gin.Context as an argument, and doesn't return anything. Inside that function, it should extract a user-id path parameter from the gin.Context, and pass that path parameter value to the local getUserById function defined as a variable. There should be a test ensuring that if a value is returned by getUserById, getUserByIdHandler returns a JSON object reflecting that value and a status code of 200. There should be another test ensuring that if getUserById returns nil, getUserByIdHandler responds with a status code of 404.

The resulting REST handler code was solid:

func getUserByIdHandler(c *gin.Context) {
    id := c.Param("user-id")
    user := getUserById(id)
    if user != nil {
        c.JSON(http.StatusOK, user)
    } else {
        c.JSON(http.StatusNotFound, gin.H{
            "status": http.StatusNotFound,
            "message": "User not found",
        })
    }
}

I happily copy/pasted that from the diff into /endpoints/user_endpoints.go

The initial test results looked like they were writen by an entirely dev or AI from what I had been seeing previously:

func TestGetUserByIdHandler(t *testing.T) {
    r := gin.Default()
    r.GET("/users/:id", getUserByIdHandler)

    // Arrange: Setting up the positive test
    getUserById = func(id string) *models.User {
        return &models.User{Id: "1", Name: "John Doe", Age: 30}
    }
    request, _ := http.NewRequest(http.MethodGet, "/users/1", nil)
    responseRecorder := httptest.NewRecorder()

    // Act: Make the request
    r.ServeHTTP(responseRecorder, request)

    // Assert: Check if the status code is 200 and payload is as expected
    assert.Equal(t, http.StatusOK,responseRecorder.Code)
    assert.Contains(t, responseRecorder.Body.String(), "John Doe")

    // Arrange: Setting up the test for a case when user doesn't exist
    getUserById = func(id string) *models.User {
        return nil
    }
    request2, _ := http.NewRequest(http.MethodGet, "/users/2", nil)
    responseRecorder2 := httptest.NewRecorder()

    // Act: Make the request
    r.ServeHTTP(responseRecorder2, request2)

    // Assert: Check if the status code is 404
    assert.Equal(t, http.StatusNotFound, responseRecorder2.Code)
}

Arrange/Act/Assert? I haven't seen that in years. I think last time I saw it was on a .Net project...in like 2016 maybe?

Notice that the tests are all in a single run, as opposed to being broken up into multiple specs via t.Run. This also leveraged stretchr/testify/assert for assertions, which previous tests didn't do.

And finally, the real problem, the mock that was used wasn't the one I had previously set up. Instead it created a new mock inside the test. Nor did it set the right path parameter in the gin.Engine - /:id as opposed to /:user-id; which is odd, since the function generated in the same JBAI run did use /:user-id.

Rather than ask it to modify, I decided to leave the test as written, changing the path parameter so that the test matches the intended implementation. This is to leave proper evidence of the next conclusion, which is that you cannot expect consistency when working with an AI coding assistant as a service.

It dawns on my that I'm probably hitting multiple instances of the same service as I continue to build out this project, and they could be running different versions of the same model, which might be getting fed different learning data sets at they run.

createUserHandler

Me > Generate Code: Please create a test suite to test a function called createUserHandler, which takes a *gin.Context as an argument and does not return anything. createUserHandler will call the local createUser function internally. The test suite should mock the createUser function to ensure that if createUser returns an error, createUserHandler responds with a 400 status code, and if createUserHandler does not respond with an error, createUserHandler response with a 201 status code.

This time the results were basically perfect. The tests didn't use the getUserMock that I had asked the JBAI to create during the setup; instead it mocked the different outcomes directly in the test specs.

func createUserHandler(c *gin.Context) {
    user := &models.User{}
    if err := c.Bind(user); err != nil {
        c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
        return
    }
    if err := createUser(user); err != nil {
        c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
        return
    }
    c.JSON(http.StatusCreated, gin.H{"data": user})
}

func TestCreateUserHandler(t *testing.T) {
    gin.SetMode(gin.TestMode)
    router := gin.Default()

    t.Run("createUser returns error", func(t *testing.T) {
        createUser = func(*models.User) error { return errors.New("mock error") }

        router.POST("/user", createUserHandler)

        req := httptest.NewRequest(http.MethodPost, "/user", nil)
        resp := httptest.NewRecorder()

        router.ServeHTTP(resp, req)

        if resp.Code != http.StatusBadRequest {
            t.Fatalf("Expected status code %d, got %d", http.StatusBadRequest, resp.Code)
        }
    })

    t.Run("createUser does not return error", func(t *testing.T) {
        createUser = func(*models.User) error { return nil }

        router.POST("/user", createUserHandler)

        req := httptest.NewRequest(http.MethodPost, "/user", nil)
        resp := httptest.NewRecorder()

        router.ServeHTTP(resp, req)

        if resp.Code != http.StatusCreated {
            t.Fatalf("Expected status code %d, got %d", http.StatusCreated, resp.Code)
        }
    })
}

This made me wonder if I should have even bothered to have the JBAI create those mock factories in the first place. And that led me to my next conclusion, which is that I can learn from the JBAI, just as I would from watching any other developer.

It's worth pointing out - and I didn't realize this until after I had run the code generator and copied the code into its respective files - that I didn't tell the JBAI to parse out a models.User from the *gin.Context. It figured out that it needed to do that on its own. Which is pretty awesome. I'll see if I can duplicate that awesomeness in my next prompt.

upsertUserHandler

I was not able to duplicate that awesomeness. Instead of parsing the request body and unmarshalling it into a struct pointer, the resulting code just created an empty struct, and sent that to the upsertUser function.

This further confirms previous conclusions indicating that I should not expect the same behavior twice.

Here's the prompt that actually succeeded:

Me > Generate Code: Please create a test suite that covers a function called upsertUserHandler, which takes a gin.Context as an argument, and doesn't return anything. upsertUserHandler should parse the PUT body from the gin.Context into a &models.User. upsertUserHandler should call the local upsertUser variable internally. upsertUserHandler should parse a models.User Id value from the *gin.Context's user-id path parameter. If upsertUser returns an error, upsertUserHandler should respond with a 400 status code. If no error is returned, then upsertUserHandler should respond with a 200. The test suite should mock upsertUser as needed to ensure that both cases are fully tested.

First off, the resulting tests followed yet another approach that I hadn't seen so far in this project. The JBAI also completely rewrote the upsertUserMock I had asked it to generate earlier.

type UpsertUserHandlerTestSuite struct {
    suite.Suite
    ctx *gin.Context
    w   *httptest.ResponseRecorder
    user models.User
}

func (suite *UpsertUserHandlerTestSuite) SetupTest() {
    gin.SetMode(gin.TestMode)
    suite.w = httptest.NewRecorder()
    suite.ctx, _ = gin.CreateTestContext(suite.w)

    suite.user = models.User{Id: "1", Name: "Dummy User 1", Age: 30}

    userJson, _ := json.Marshal(suite.user)
    req, _ := http.NewRequest(http.MethodPut, "/user/1", bytes.NewBuffer(userJson))
    suite.ctx.Request = req
    suite.ctx.Params = gin.Params{gin.Param{Key: "user-id", Value: "1"}}
}

func TestUpsertUserHandlerTestSuite(t *testing.T) {
    suite.Run(t, new(UpsertUserHandlerTestSuite))
}

func (suite *UpsertUserHandlerTestSuite) TestUpsertUserHandlerSuccess() {
    mockService := new(MockService)
    mockService.On("UpsertUserMock", mock.AnythingOfType("*models.User")).Return(nil)

    oldFunc := endpoints.UpsertUser
    endpoints.UpsertUser = mockService.UpsertUserMock

    endpoints.UpsertUserHandler(suite.ctx)

    mockService.AssertExpectations(suite.T())
    suite.Require().Equal(http.StatusOK, suite.w.Code)

    endpoints.UpsertUser = oldFunc
}

func (suite *UpsertUserHandlerTestSuite) TestUpsertUserHandlerError() {
    mockService := new(MockService)
    mockService.On("UpsertUserMock", mock.AnythingOfType("*models.User")).Return(models.ErrorUserNotFound)

    oldFunc := endpoints.UpsertUser
    endpoints.UpsertUser = mockService.UpsertUserMock

    endpoints.UpsertUserHandler(suite.ctx)

    mockService.AssertExpectations(suite.T())
    suite.Require().Equal(http.StatusBadRequest, suite.w.Code)

    endpoints.UpsertUser = oldFunc
}

type MockService struct {
    mock.Mock
}

func (m *MockService) UpsertUserMock(user *models.User) error {
    args := m.Called(user)
    return args.Error(0)
}

More examples for why we should not expect consistency. I would later notice that this complex test setup doesn't cover the error case for the below context.BindJSON, which is a pretty big miss.

However the bigger issue was in the actual REST handler code that was created. See if you can spot where the function can fail silently:

func UpsertUserHandler(context *gin.Context) {
    var user models.User
    if err := context.BindJSON(&user); err == nil {
        if userId, exists := context.Params.Get("user-id"); exists {
            user.Id = userId
            if err := upsertUser(&user); err == nil {
                context.JSON(http.StatusOK, gin.H{})
            } else {
                context.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
            }
        }
    } else {
        context.JSON(http.StatusBadRequest, gin.H{"error": "Bad Request"})
    }
}

Here's a hint - the code checks that the path parameter exists, but doesn't handle the else case. Meaning if a request was sent to PUT /users// there would be no response. I had to manually add in there. Next conclusion - the JBAI will not handle all possible error cases.

Some odd choices were made here:

  • Notice how much of this code is exported - even the MockService struct is exported.

  • The test code treated UpsertUser as something that lives in a different package, even though the function and its test were output into the same scratch file.

  • The mock error it chose to go with is ErrorUserNotFound, which it incorrectly attributed to the models package.

  • It replaced the mock factory function with a more complex mocking setup that uses testify/mock, a part of the testify library that I'm not familiar with. Without running these tests, I have no idea if they'll work as expected.

Next conclusion - be prepared to be confused by the code you see.

Upon trying to run this test, it turned out that compilation failed because of a missing dependency. testify/mock requires a dependency (github.com/stretchr/objx) that I don't have. So - next conclusion - while the JBAI had previously claimed that it would not add dependencies to my project, it will assume dependencies are there that might not be.

deleteUserHandler

Me: > Generate Code: Please create tests that fully cover a new function called deleteUserHandler, which takes a gin.Context as an argument. deleteUserHandler should parse a user ID string value from the user-id path paramter in the gin.Context it receives. It should call deleteUser internally; if deleteUser responds with an ErrorUserNotFound, , deleteUserHandler should respond with a 404 status code. If deleteUser returns no error, deleteUserHandler should respond with a 204 status code. The tests for deleteUserHandler should use a mock for the different behaviors of deleteUser.

Here's the request handler code:

func deleteUserHandler(c *gin.Context) {
    id := c.Param("user-id")
    err := deleteUserById(id)

    if err != nil {
        if err == errorUserNotFound {
            c.Status(http.StatusNotFound)
        } else {
            c.Status(http.StatusInternalServerError)
        }
        return
    }

    c.Status(http.StatusNoContent)
}

Notice that it used the local variable I had created to determine the error type, and! it created a case where it would return a 500 if it saw an unexpected error. That's good code.

The concern I have is that the tests that were generated fail to cover the StatusInternalServerError branch of that condition:

// Now we need test deleteUserHandler function
func TestDeleteUserHandler(t *testing.T) {
    // testing case when user was deleted
    deleteUserById = deleteUserByIdMock(nil)
    router := gin.Default()
    router.DELETE("/user/:user-id", deleteUserHandler)

    req, _ := http.NewRequest(http.MethodDelete, "/user/1", nil)
    resp := httptest.NewRecorder()
    router.ServeHTTP(resp, req)

    if status := resp.Result().StatusCode; status != http.StatusNoContent {
        t.Errorf("handler returned wrong status code: got %v want %v",
            status, http.StatusNoContent)
    }

    // testing case when user not found
    deleteUserById = deleteUserByIdMock(errorUserNotFound)
    req, _ = http.NewRequest(http.MethodDelete, "/user/1", nil)
    resp = httptest.NewRecorder()
    router.ServeHTTP(resp, req)

    if status := resp.Result().StatusCode; status != http.StatusNotFound {
        t.Errorf("handler returned wrong status code: got %v want %v",
            status, http.StatusNotFound)
    }
}

This matches what I saw above, where - next conclusion - a very obvious error case can be missed in tests written by the JBAI, even when the JBAI chose to implement that error case.

The Resulting Tests

Before getting to the actual excution of the tests, here are some notes about how the tests were written:

  • There are three different styles of test setups, for testing five functions. t.Run is the most common approach.

  • Only two of the mock factories I hda the JBAI create in the beginning were used, and only one of the error types was used.

  • There are some glaring omissions in code coverage.

That all said, it's worth running them just to see how coverage looks at first pass.

And, wouldn't you know it, the tests crashed on the first attempt:

=== RUN   TestCreateUserHandler/createUser_returns_error
[GIN] 2023/12/26 - 07:16:02 | 400 |     119.875µs |       192.0.2.1 | POST     "/user"
    --- PASS: TestCreateUserHandler/createUser_returns_error (0.00s)
=== RUN   TestCreateUserHandler/createUser_does_not_return_error
    --- FAIL: TestCreateUserHandler/createUser_does_not_return_error (0.00s)
panic: handlers are already registered for path '/user' [recovered]
    panic: handlers are already registered for path '/user'

That's because it used the same gin.Engine for both specs in the TestCreateUserHandler suite. This fits in line with the JBAI not following good practices for testing; in this case, it didn't consider (or didn't know how to consider) that each test spec would have to run alongside other specs.

This was fixed by adding a single line at the top of the second spec:

router = gin.New()

Now for the big question - how was the coverage?? Honestly, pretty good! Considering this was almost entire prompt-driven code, the tests covered 87.1% of statements in /endpoints/user_endpoints.go.

That may not be as golden as the 100% coverage that was acheived in /services/users_service.go, but, to be fair, this was a much harder package to test, which was by design. There was an imported local package that had to be mocked, the functionality depended on an imported RESTful library, and there were a lot more things that could go wrong.

What's Next?

All that's left to do is to wire it all together, and see how the project wraps up. The next post will cover that, as well as go through a bit of a retro on the big takeaways from this series.