Big Data Analysis - Week 4

Monday the 11th - Friday the 15th of May 2020

Monday 11th of May (afternoon):
Lectures: Computer infrastructure, Networks, Scaling, and Speed (BV).
     You are expected to see these four short lectures before the lecture, which will then be an interactive discussion.

Zoom:Link to lecture.
     Recording in Lecture video (320 MB) and Lecture audio (18 MB) along with chat (23 kB).
     Recording in Exercise video (1.37 GB) and Exercise audio (22 MB).

Exercise: You may work on your small project, or throw yourself on the housing price prediction, shown below.


Wednesday 13th of May (morning):
Lectures: Convolutional Neural Networks (Aleksandar Topic) and Example use of CNNs (TP).
     Very good visualisation of how a CNN works.

Zoom:Link to lecture (Recorded!).
     Recording in Lecture video (173 MB) and Lecture audio (21 MB) along with Lecture chat (8 kB).
     Recording in Exercise video (332 MB) and Exercise audio (25 MB).

Exercise: See if you can recognise handwritten numbers with a Convolutional Neural Network: CNN_MNISTdata.ipynb.
The data can be obtained from Yann LeCun's webpage: Link to MNIST dataset


Wednesday 13th of May (afternoon):
Lectures: Input feature ranking and Shapley values (TP), ML method performance overview (TP), and The t-SNE algorithm for (non-linear) dimensionality reduction (Alexander Nielsen).

Zoom:Link to lecture (Recorded!).

Exercise: We will again work with handwritten numbers, this time seeing if PCA and t-SNE can differentiate between them using the unsupervised t-SNE algorithm.
We might in the late afternoon also work on the Small Project (due on Monday at 22:00).


Thursday 14th of May (morning):
ML Workshop: Here you can find the Workshop Program and Zoom link.




Housing cleaning, clustering, and estimating exercise:
The housing data consists of about 50000 housing sales, where 90+ features are provided along with the actual sales price. The original data HousingPrices_Org.csv (21 MB) first needs cleaning. Following cleaning, one can add features (here GPS coordinates and distance to sea) using the additional data files:
  • GPS_data.csv (1.3 MB)
  • SEA_DIST.csv (680 kB)
    Example code for doing this can be found here:
  • RegressionOnHousing_Clean_Data.py (3.7 kB)
  • RegressionOnHousing_Feature_Adding.py (11 kB)
    An example of the resulting data file can be found here: HousingPrices_Cleaned.csv (8.8 MB), and example code for actual sales price estimates here: RegressionOnHousing.py (34 kB).
    Last updated: 8th of May 2020 by Troels Petersen.