Data Challenge

The questions below are designed to test proficiency in tasks related to news and data. Each is intended to be more difficult than the last.

Question 1

A source emails you a file with the extension CSV. What program would you use to open the file?

Question 2

The yearly batting averages for Derek Jeter and David Justice are given by this table:

Year Derek Jeter David Justice
1995 .250 .253
1996 .314 .321
1997 .291 .329

Which player had the highest cumulative batting average from 1995 through 1997?

Question 3

According to the American Community Survey’s 5-year estimates for 2013, which county in the U.S. had the highest percentage of homes worth $1,000,000 or more?

Question 4

The U.S. Securities and Exchange Commission maintains an online filing system called EDGAR. Data about those filings can be accessed via the agency’s FTP site. How many filings did the agency receive in the second quarter of 2010?

Question 5

The New York City Police Department posts information online about people its officers have searched under a program called “stop and frisk.” Which NYPD precinct stopped and frisked the most people in 2010?

Question 6

Download the file called manyfiles.zip and extract it on your local machine. Inside are a large number of files, each of which contains a number. Tally up all the numbers: What is the total?

Question 7 (Extra Credit!)

Download the file bonus.csv, which is a table containing seven simulated variables, V1 through V7.

Briefly describe in words or images how the first variable (V1) interacts with the remaining six variables (V2 through V7).