BURT: A Bug Reporting Chatbot

Author(s): Kristen Goebel, Jasmine Obas

Mentor(s): Kevin Moran, Computer Science

Abstract

Many bug reports written by end users lack key information, such as the steps needed to reproduce the bug. Without this information, it can be difficult to reproduce and assess the reported bug, leading to more time spent working on resolving bugs as well as reported problems that cannot be fixed because they cannot be reproduced. Our work this summer focused on doing analysis for BURT, a bug reporting chatbot which aims to improve the quality of bug reports. We used various tool and metrics to assess the quality of existing bug reports and app reviews, creating a baseline for the current quality of reports and highlighting areas for improvement. In particular, we have made use of a tool which uses neural sentence classification and linear support vector machines to identify whether important information is included in a given bug report. Furthermore, we also analyze the readability of bug report prose through the calculation of three complementary metrics: spelling mistakes, grammar mistakes, and language regularity. Additionally, we have assisted with collecting data for and setting up a user study that will be used to evaluate BURT in comparison to existing bug reporting systems. Our research this summer lays the groundwork for illustrating the benefits that interactive bug reporting systems can have for both end-users and developers — leading to the creation of more informative bug reports for developers, with low effort required from end-users.

Video Transcript

[0:01] My name is Kristen Goebel and I’m a senior at Clarkson University studying math and computer science. And my name is Jasmine Obas and I’m a senior at George Mason University studying Applied Computer Science with a Concentration in Computer Game Design. This summer we worked with Dr. Kevin Moran on BURT, a bug reporting chatbot. [0:20] To start, we’ll take a look at the current state of bug reporting. Ideally, end users and developers report issues with software by creating informative bug reports that allow developers to easily understand, recreate, and fix these issues. [0:35] Unfortunately, these bug reports are often lacking key information, especially in the steps to reproduce the bug. These steps are often missing, ambiguous, or incomplete, making it difficult to recreate the bug. [0:46] Lack of information is such an issue in bug reports that in a petition to GitHub, GitHub developers noted missing crucial information, like reproduction steps, as a major issue they feel needs to be addressed by the site. [1:00] As a result of these issues, many bug reports cannot be reproduced and go unfixed. More time may be needed to understand and assess the severity of the bug, and bugs may take longer to be resolved. [1:14] To help improve the quality of these bug reports, we propose to use BURT, a bug reporting chatbot. This chatbot allows users to interact with a messaging interface that guides them through reporting the necessary information to make their bug report more complete and more valuable to developers. The goal of the chatbot is to ensure reports contain, at minimum, the observed behavior, the expected behavior, and the complete set of steps to reproduce the bug. [1:39] BURT is able to help users report the complete set of steps to reproduce by using its app execution model. This graph contains possible steps that can be taken in the app based on the app’s current state. This allows the chatbot to suggest possible next steps to the user. This graph is obtained through automated exploration and manual traces of the app. [2:02] In our motivational study we have the BEE API Tool that was used to analyze two datasets in our study. Which was ARMiner and AndroR2 which include bug reports and app reviews from various Android applications. The tool would take in the text in json format and return the texts tagged with one or more of the three components SR, OB, and EB. We then took the data on our collected bug reports to create a chart of the results and give a visual of what users usually forget in the reports and the underlying problem that they usually are missing key information for a standard bug report. Being able to tag these texts with the api also shows the benefit of it and how we can use it to detect when these key components are missing and prompt the user to provide this information before submitting a ticket. [2:54] To give a closer look at the BEE Tool components we’ll go over what these tags mean. We have OB for Observed Behavior, meaning the user shared the issue they observed. EB for expected behavior, which meant the user shared what the behavior should’ve been. And lastly, SR which stands for steps to reproduce, meaning the user shared the steps to find the exact issue. These components help the developer better organize the reports received, as well as get a quicker understanding of what the bug reports are about. [3:27] The ARMiner, which is one of our data sets that I mentioned, is a dataset of app reviews on Android Applications. We processed this dataset in our BEE Tool and Language Tool. BEE tool tagged the components of the texts for meaning and the Language tool detected grammar errors in the texts. These were our results with that dataset and based on these charts we can see that grammar and information in bug reports is a huge barrier in bug report comprehension. We will be repeating this same process for AndroR2 which is a dataset with over 82,000 bug reports. [4:04] Our user study involves having participants record and document app traces in various android apps. They’d do this by using an Android Emulator on Android Studios, to get a virtual phone to download selected apps we provide. To collect an app trace the user would download the app and familiarize themselves with the layout first. Then when they’re ready to begin they would have to delete and reinstall the app for each trace they will make, which is usually around 5. To record their app traces they would use a capture tool developed by one of our researchers. Once this tool was started they could complete one app trace, and during this the tool would record the screen and log the components the user selects on the GUI. Once the user was done with their app trace they would end the capture tool and all the mentioned data would be saved to a file. The user would repeat these steps for each individual app trace. [5:01] To summarize, users often create low quality bug reports that can make it difficult for developers to fix these bugs. Our work this summer shows the current quality of bug reporting and will serve as a baseline for assessing the results from the future BURT user study. Our goal is for BURT to be used to help end users generate higher quality bug reports, improving the software development process.

One reply on “BURT: A Bug Reporting Chatbot”

Leave a Reply