Enrichment Course on Web Data Extraction and Regression Analysis

Fired with an educational zeal, Preferred.AI conducted a 5-day (Oct 3-8 2018) enrichment course on Web Data Extraction and Regression Analysis, which was organized by the SMU School of Information Systems in conjunction with the impending launch of its BSc (Computer Science).

The instruction team, L to R: Max, Ween, Tuan, Jingyao, Hady, Andrew, Aghiles

We welcomed a group of 20 bright students from Singapore Polytechnic (SP), Ngee Ann Polytechnic (NP), and Temasek Polytechnic (TP).  They are part of the prestigious Industry Preparation for Pre-graduate (iPREP) Programme or Infocomm Polytechnic (iPoly) Scholarship run by IMDA.


Through the course, we inculcated the appreciation of a data science pipeline, by requiring the students to go through the different stages of data collection, feature engineering, and building a prediction model using regression, culminating on an end-to-end project.


On Day 1, Ween anchored the data collection module of the course, which is based on Venom, a focused crawler framework for the deep web developed in-house by Preferred.AI and open-sourced for public use.

Ween ran with the data collection module of the course. With the zest of a seasoned instructor, he sure wasn’t a “crawler”.


Aghiles (right) coached Group 1
Max (2nd from left) coached Group 2
Tuan (middle) coached Group 3
Ween (middle) coached Group 4
Jingyao (right) coached Group 5
Andrew (standing, left) coached Group 6

To get students to internalize the lessons meaningfully, they worked in small groups on a realistic project involving a specific Web site.  Working with each group was a coach from Preferred.AI.  That the groups managed to build working crawlers by the end of Day 2 spoke of their effective teamwork, the coaches’ able guidance, and Venom‘s powerful features.


On Day 3, Max instructed the students on how to use machine learning techniques such as linear regression and logistic regression to build prediction models.  On Day 4, the project groups began training their models using the data they were collecting with the crawlers they’d built earlier.

Max dished out lessons on regression and satiated the students’ hunger to learn by using chicken rice as an example.


Group 1 predicted ratings based on restaurant reviews
Group 2 assessed how to price properties for sale
Group 3 predicted the price ranges food places
Group 4 built a model to price used cars effectively
Group 5 estimated rental prices of apartments
Group 6 categorized cars by types based on their features 

In true SIS learning style, on Day 5 the student groups took turns presenting, and defending, their projects.  The team camaraderie was palpable.  The coaches were as anxious as their proteges.  In just 5 short days, they bonded.  In the post hoc feedback, 90% of the students rated the coaches as ‘Helpful’ or even ‘Awesome’, without whom the learning experience just wouldn’t be the same.


The projects were finally evaluated by Hady and a guest judge, Wu Huayu (VP, Data Science – DBS Bank).  Huayu had earlier kindly shared his expertise by giving a talk titled “Applications of Data Science in Industries”, a wide-ranging coverage of the history of big data and AI and how their applications touched various industries including banking, manufacturing, etc.


Presentation of the Best Project Prize, L to R: Hady (course coordinator), Group 4 (winner) comprising Jewelyn, Jeffery and Ronald from SP, Huayu (guest judge)
Presentation of Best Class Participation Prize, L to R: Hady (course coordinator), Max (instructor), Danial (winner) from SP, Ween (instructor)

We awarded two prizes.  The Best Project was awarded to the group with the most creative, technically rigorous, impactful, and well-presented project. The Best Class Participation was awarded to the individual student who contributed the most to the class learning.


Once the prizes were announced, the winners were finally revealed.  Yet there was no loser in sight.  After all, when learning takes place, we’re all winners.