Sports Analytics in Practice with R. Ted Kwartler
186
195 187
196 188
197 189
198 190
199 191
200 192
201 193
202 194
203 195
204 196
205 197
206 198
207 199
208 200
209 201
210 202
211 203
212 204
213 205
214 206
215 207
216 208
217 209
218 210
219 211
220 212
221 213
222 214
223 215
224 216
225 217
226 218
227 219
228 220
229 221
230 222
231 223
232 224
233 225
234 226
235 227
236 228
237 229
238 230
239 231
240 232
241 233
242 234
243 235
244 236
245 237
246 238
247 239
248 241
249 242
250 243
251 244
252 245
253 246
254 247
255 248
256 249
257 250
258 251
259 253
260 254
Preface
Sports is one of the few places where the data and outcomes are well known. Unlike medicine which requires significant subject-matter expertise or business where the data is proprietary in most cases, sports knowledge is relatively accessible, and the data and outcomes are public. As a result, sports analytics serves as a great entry point for many aspiring data scientists and analytics professionals. For the novice, this book demonstrates the many facets and uses of countless techniques applicable outside of sports. It should have more than enough topics and examples to aid learning for general practice. For the avid R programmer and sports fan, the book likely has some new functions and techniques which may be less well known. These readers will delight in improving and expanding the demonstrated methods once the core concepts are understood. Finally, for those already in the sports analytics world the techniques and individual chapter topics can serve as a reference and starting point in their professional analysis. For instance, much of the use cases in the chapters can be adjusted to specific sports or updated by more recent underlying data.
This book has been a long journey in the making. Originally the book’s scope was centered on individualized chapters demonstrating analytical techniques within a sports context. The goal is that a reader inherits various tools that act as a foundation for analysis to build upon and add complexity with subsequent analyses as the reader’s technical acumen and sports interests grow. Each chapter is meant to be a standalone reference as the reader explores and learns. This also frees up the reader to focus on topics of interest. For example, a reader may not want to learn about natural language processing so could skip that chapter altogether to focus on another subject such as optimizing a fantasy football lineup. The book’s undertaking grew in complexity due to a personal commitment to demonstrate concepts on diverse data sets including Paralympic athletes, female soccer and basketball, and less US-centric popular sports including cricket in addition to the more typically demonstrated sports analyses of men’s football, baseball, and basketball. My goal is to make the subject accessible and relevant to many in the analytics field despite this effort slowing the book’s creation. Keep in mind a chapter’s concepts can be applied to many sports domains. For example, the text analysis applied to cricket fan forum posts can easily be applied to men’s basketball fan tweets or forum posts. Each chapter’s takeaway is meant to be a broadly useful tool, not a brittle or narrowly focused analysis. Additionally, the book was delayed due to the pandemic’s effect on the sports-world. Admittedly the shortened seasons, canceled games, and other changes that created outlier statistics pales in comparison to the pandemic’s hardship and humanistic impact outside of sports. Despite these challenges, the book’s end result was worth the delay. The final product covers many diverse concepts, and data, encouraging analytics professionals to enjoy the intersection of sports and analysis.
The book’s supporting website is www.rstatsbook.com. The site contains data and scripts along with any code revisions necessary as functions and packages change. Redundantly, data is shared via git repository at www.github.com/kwartler/Practical_Sports_Analytics.
Author Biography
Ted Kwartler
Adjunct Professor, Harvard University
Ted Kwartler is the VP, Trusted AI at DataRobot. At DataRobot, Ted sets product strategy for explainable and ethical uses of data technology in the company’s application. Ted brings unique insights and experience utilizing data, business acumen, and ethics to his current and previous positions at Liberty Mutual Insurance and Amazon. In addition to having four DataCamp courses, he teaches graduate courses at the Harvard Extension School and is the author of Text Mining in Practice with R.
Analytics