Mark Black - Data Vis, Analytics, Web Dev

Data Scraping

The dataset was gathered using a python script and the BeautifulSoup python library, targeting wheresthejump.com's full movie list. The Requests library was also used to fetch the data for local storage.

Data Processing

The data from the website has been scraped, but it is formatted as an HTML table. In order to make it usable for analysis and visualization, it is processed into JSON.

Describing the Data

With the data in a useful form, we can describe the dataset. Using Simple Statistics we can find several descriptive statistics very easily.

Jumpscares in Horror Movies, 1940s to 2018

Mouse over to view title and number of jump scares. Click dots to change stacked chart elements.

Jump scares and IMDB ratings: a (toy) linear regression analysis

Are movies with more jump scares higher rated? Does the number of jump scares influence an audience's rating of a horror movie? Click dots to change z-order of stacked chart elements, and mouse over to view movie's title and number of jump scares.

Linear regression generated using the Simple Statistics javascript library.

It is easy to question the usefulness of a linear regression model in this case. A high standard error value and a low R squared value indicate that the model does not adequately predict observed values. However, a low p-value suggests that the model does have a potentially interesting result, with the number of jump scares potentially being an influential factor in audience ratings. The p-value was generated against the null hypothesis that IMDB scores would not be influenced by the number of jumpscares in a film.

The p-value was generated by (and results of the simple-statistics functions validated against) the scipy python library, using the function scipy.stats.lingress.