The dataset was gathered using a python script and the BeautifulSoup python library, targeting wheresthejump.com's full movie list. The Requests library was also used to fetch the data for local storage.
The data from the website has been scraped, but it is formatted as an HTML table. In order to make it usable for analysis and visualization, it is processed into JSON.
With the data in a useful form, we can describe the dataset. Using Simple Statistics we can find several descriptive statistics very easily.
Mouse over to view title and number of jump scares. Click dots to change stacked chart elements.
Are movies with more jump scares higher rated? Does the number of jump scares influence an audience's rating of a horror movie? Click dots to change z-order of stacked chart elements, and mouse over to view movie's title and number of jump scares.
It is easy to question the usefulness of a linear regression model in this case. A high standard error value and a low R squared value indicate that the model does not adequately predict observed values. However, a low p-value suggests that the model does have a potentially interesting result, with the number of jump scares potentially being an influential factor in audience ratings. The p-value was generated against the null hypothesis that IMDB scores would not be influenced by the number of jumpscares in a film.
The p-value was generated by (and results of the simple-statistics functions validated against) the scipy python library, using the function