A Framework for Scientific Discovery through Video Games. Seth Cooper
ahead of time which parts of the problem players would be best at solving, or which in-game manipulation tools they would use most effectively. The only way to find out was to have people play Foldit. In order to deal with these and other uncertainties, we took an iterative approach both before and after releasing the game to the public. We have continually evolved the gameplay in response to massive gameplay traces, player feedback and scientists’ analysis, and continue even now with this iterative process as we add features and expand the set of biochemical problems to which the Foldit community can contribute.
Games are often designed with an iterative approach, which involves designing, testing, and evaluating repeatedly until the player’s experience meets some criteria [Fullerton 2008 ]. For most games, the main criterion for the player’s experience is simply to have fun. Player feedback and playtesting are an integral part of the process, and there are a number of methods of gathering and incorporating this information from players [Ambinder 2009 ]. We have also continued the design process after the game’s release, to incorporate data gathered from the players in a continual process of evolutionary redesigning [Kennerly 2003 ]. Our work differs from the standard iterative approach in that the game design space is constrained to conform with existing physical models, we include the input of scientists in the evaluation of the game, and we include the long-term coevolution of the players and game in the design.
3.2 Biochemistry Background
Here we provide some background on biochemistry and proteins that will be used throughout the rest of this work.
DNA, a cellular chemical perhaps more widely recognized than proteins, derives its entire purpose in encoding protein sequences. Proteins are coded for by DNA, and are created in the cell as a long chain of amino acids. A protein’s amino acid sequence is known as its primary structure. There are twenty different types of amino acids. Regardless of type, some of atoms making up the amino acid will be the same; these are connected together and form the protein’s backbone. However, the remaining atoms are different for each type; these extend outward from the backbone and are called sidechains. The atoms that make up the sidechains divide the amino acids into two main groups: hydrophobic, which prefer to be buried on the interior away from water; and hydrophilic, which prefer to be exposed on the exterior near water. These preferences impact how the protein folds. As the amino acids are connected together, the protein begins to fold up; after the amino acids join together, they are often called residues. Local characteristics of the fold are referred to as secondary structure. These include: helices, which are tightly coiled; sheets, which are extended straight; and loops, which are everything else. The positions of the atoms making up a folded protein is its tertiary structure; the tertiary structure taken in nature is a native structure. The native structure is one that is lowest in free energy—it has the most favorable set of chemical interactions. It is well known that sequence determines structure [Anfinsen 1973 ]. In this book, the term sequence will refer to a protein’s primary structure, and structure will refer to its tertiary structure, unless otherwise specified.
3.3 Framework Description
3.3.1 Architecture
Herewegiveanoverview of the architecture of Foldit, which can be seen at a high level in Figure 3.2. Foldit uses a client-server architecture. Players must create an account and download the game in order to play. Thegamethen communicates with a central server to send information about the local player and get information about other players.
Scientists post problems to the server; in the case of Foldit, these are protein structures for which the players are meant to find the native structures. An initial protein structure is associated with metadata such as a title and description, and parameterization such as which energy function terms to use. We call these puzzles, and they are posted on the server for a fixed amount of time (usually a week). While a puzzle is active, players can download it and interactively reshape the protein to try to achieve the best score. This often requires significant changes to the puzzle structures, which are given in various partially-folded states, and in some cases need to be completely refolded from a straight line. Players’ structures, or solutions, are reported back to the server, and players are ranked against other players who are playing the same puzzle. Players can form groups with which to share their solutions through the server, allowing them to work together to find even better solutions than they could working alone. Whenone player shares a solution by uploading it to the server, other players in the same group are able to see it and download it. The social aspect of the game is supported by in-game chat, a website with forums, and a player-created wiki. At the close of a puzzle, the solution data is aggregated, and presented to the scientists for analysis.
The game is designed to be flexible, and the client allows automatic updating so that we can continually evolve the gameplay. The puzzle posting cycle and automatic updates allow us to respond to not only player feedback, but also to scientists’ analysis, as we introduce and refine gameplay elements.
Figure 3.2 Overview of architecture for scientific discovery games. The biochemistry team provides structure prediction and design problems for the server. These problems become puzzles and are sent to each player’s client. Players collaborate and compete to solve these problems and upload their solutions to the server, where they are aggregated and sent back to the biochemistry team for analysis. This analysis can then be used to improve the design of the game and puzzles. (Figure from Cooper et al. [2010b])
Foldit is built on top of the Rosetta molecular modeling suite which has proven useful at a wide variety of protein modeling tasks [Rohl et al. 2004, Bradley et al. 2005, Qian et al. 2007, Kuhlman et al. 2003]. The suite contains an energy function which captures the interaction energies between protein elements, as well as a set of structural optimization subroutines. For protein structure prediction, structures closer to the native structure will have a lower energy than structures further away from it. Foldit uses this state-of-the-art energy function to compute player’s scores, and also takes advantage of the optimization routines Rosetta makes available.
3.3.2 Coevolution Strategy
In order to arrive at the current state of Foldit, we took an coevolution approach to the game’s design. Given the complexity of this undertaking, werealized that it was unlikely that all our initial decisions would be the best. There are three major groups relevant to our approach: (1) the scientists whose problems the game is meant to help solve; (2) the players; and (3) the game development team. The development team must incorporate feedback from the players to make sure the game is understandable and fun, and from the scientists to make sure that the results produced will be useful to them. Anoverview of the interactions between these three groups is given in Figure 3.3.
Figure 3.3 Overview of the interactions between the three iterative design groups. (Figure from Cooper et al. [2010b])
During the game’s initial development, the development team and scientists must work together closely to determine an initial direction. This involves defining what problems to approach, what the fundamental gameplay mechanics needed are, and what the desired results are. Once possible games have been prototyped, player feedback can begin to be incorporated. Early playtesting helps to uncover what elements of the problem are fun and which can be most confusing and difficult to understand. This can help to both focus the gameplay and narrow the scope of the game to where players will most likely be able to contribute.
After making the game available to the public, a large amount of data and feedback can become available to help improve the game. As in a traditional game, data on gameplay can be gathered from players for an