Big Data Analysis - Week 4
Monday the 11th - Friday the 15th of May 2020
Monday 11th of May (afternoon):
Lectures:
Computer infrastructure,
Networks,
Scaling, and
Speed (BV).
     You are expected to see these four short lectures
before the lecture, which will then be an interactive discussion.
Zoom:
Link to lecture.
     Recording in
Lecture video (320 MB) and
Lecture audio (18 MB) along with
chat (23 kB).
     Recording in
Exercise video (1.37 GB) and
Exercise audio (22 MB).
Exercise: You may work on your small project, or throw yourself on the housing price prediction, shown below.
Wednesday 13th of May (morning):
Lectures:
Convolutional Neural Networks (Aleksandar Topic) and
Example use of CNNs (TP).
    
Very good visualisation of how a CNN works.
Zoom:
Link to lecture (Recorded!).
     Recording in
Lecture video (173 MB) and
Lecture audio (21 MB) along with
Lecture chat (8 kB).
     Recording in
Exercise video (332 MB) and
Exercise audio (25 MB).
Exercise: See if you can recognise handwritten numbers with a Convolutional Neural Network:
CNN_MNISTdata.ipynb.
The data can be obtained from Yann LeCun's webpage:
Link to MNIST dataset
Wednesday 13th of May (afternoon):
Lectures:
Input feature ranking and Shapley values (TP),
ML method performance overview (TP), and
The t-SNE algorithm for (non-linear) dimensionality reduction (Alexander Nielsen).
Zoom:
Link to lecture (Recorded!).
Exercise: We will again work with handwritten numbers, this time seeing if
PCA and t-SNE
can differentiate between them using the unsupervised t-SNE algorithm.
We might in the late afternoon also work on the Small Project (due on Monday at 22:00).
Thursday 14th of May (morning):
ML Workshop: Here you can find the
Workshop Program and
Zoom link.
Housing cleaning, clustering, and estimating exercise:
The housing data consists of about 50000 housing sales, where 90+ features are provided along with the actual sales price.
The original data
HousingPrices_Org.csv (21 MB) first needs cleaning.
Following cleaning, one can add features (here GPS coordinates and distance to sea) using the additional data files:
GPS_data.csv (1.3 MB)
SEA_DIST.csv (680 kB)
Example code for doing this can be found here:
RegressionOnHousing_Clean_Data.py (3.7 kB)
RegressionOnHousing_Feature_Adding.py (11 kB)
An example of the resulting data file can be found here:
HousingPrices_Cleaned.csv (8.8 MB),
and example code for actual sales price estimates here:
RegressionOnHousing.py (34 kB).
Last updated: 8th of May 2020 by Troels Petersen.