Enrichment Course on Web Data Extraction and Regression Analysis

Fired with an educational zeal, Preferred.AI conducted a 5-day (Oct 3-8 2018) enrichment course on Web Data Extraction and Regression Analysis, which was organized by the SMU School of Information Systems in conjunction with the impending launch of its BSc (Computer Science).

The instruction team, L to R: Max, Ween, Tuan, Jingyao, Hady, Andrew, Aghiles

We welcomed a group of 20 bright students from Singapore Polytechnic (SP), Ngee Ann Polytechnic (NP), and Temasek Polytechnic (TP).  They are part of the prestigious Industry Preparation for Pre-graduate (iPREP) Programme or Infocomm Polytechnic (iPoly) Scholarship run by IMDA.

 

Through the course, we inculcated the appreciation of a data science pipeline, by requiring the students to go through the different stages of data collection, feature engineering, and building a prediction model using regression, culminating on an end-to-end project.

 

On Day 1, Ween anchored the data collection module of the course, which is based on Venom, a focused crawler framework for the deep web developed in-house by Preferred.AI and open-sourced for public use.

Ween ran with the data collection module of the course. With the zest of a seasoned instructor, he sure wasn’t a “crawler”.

 

Aghiles (right) coached Group 1
Max (2nd from left) coached Group 2
Tuan (middle) coached Group 3
Ween (middle) coached Group 4
Jingyao (right) coached Group 5
Andrew (standing, left) coached Group 6

To get students to internalize the lessons meaningfully, they worked in small groups on a realistic project involving a specific Web site.  Working with each group was a coach from Preferred.AI.  That the groups managed to build working crawlers by the end of Day 2 spoke of their effective teamwork, the coaches’ able guidance, and Venom‘s powerful features.

 

On Day 3, Max instructed the students on how to use machine learning techniques such as linear regression and logistic regression to build prediction models.  On Day 4, the project groups began training their models using the data they were collecting with the crawlers they’d built earlier.

Max dished out lessons on regression and satiated the students’ hunger to learn by using chicken rice as an example.

 

Group 1 predicted ratings based on restaurant reviews
Group 2 assessed how to price properties for sale
Group 3 predicted the price ranges food places
Group 4 built a model to price used cars effectively
Group 5 estimated rental prices of apartments
Group 6 categorized cars by types based on their features 

In true SIS learning style, on Day 5 the student groups took turns presenting, and defending, their projects.  The team camaraderie was palpable.  The coaches were as anxious as their proteges.  In just 5 short days, they bonded.  In the post hoc feedback, 90% of the students rated the coaches as ‘Helpful’ or even ‘Awesome’, without whom the learning experience just wouldn’t be the same.

 

The projects were finally evaluated by Hady and a guest judge, Wu Huayu (VP, Data Science – DBS Bank).  Huayu had earlier kindly shared his expertise by giving a talk titled “Applications of Data Science in Industries”, a wide-ranging coverage of the history of big data and AI and how their applications touched various industries including banking, manufacturing, etc.

 

Presentation of the Best Project Prize, L to R: Hady (course coordinator), Group 4 (winner) comprising Jewelyn, Jeffery and Ronald from SP, Huayu (guest judge)
Presentation of Best Class Participation Prize, L to R: Hady (course coordinator), Max (instructor), Danial (winner) from SP, Ween (instructor)

We awarded two prizes.  The Best Project was awarded to the group with the most creative, technically rigorous, impactful, and well-presented project. The Best Class Participation was awarded to the individual student who contributed the most to the class learning.

 

Once the prizes were announced, the winners were finally revealed.  Yet there was no loser in sight.  After all, when learning takes place, we’re all winners.

UAI-2018 in Monterey

In early August 2018, Aghiles attended the international conference on Uncertainty in Artificial Intelligence (UAI), which took place in Monterey, California. Hereafter, he shares his UAI experience.

 

The conference was hosted in the InterContinental Hotel, and according to the organizers, this was the biggest UAI ever. The papers presented in the conference covered various current areas in machine learning and AI. Among the topics most represented were: Representation Learning (where our contribution falls into), Causal Inference, Variational Inference, Gaussian Process, Online and Reinforcement Learning.

The InterContinental, main entrance side.
UAI’18 opening words

 

 

 

 

 

 

 

 

 

 

 

 

Each accepted paper was presented as either an oral and/or poster. For the oral presentations, there was only one session at time, which was quite convenient as one could attend any talk of interest. I particularly enjoyed the daily poster sessions; they were highly attended and allowed for deep and enriching discussions/exchanges.

 

Aghiles with our poster on Probabilistic Collaborative Representation Learning

Our work accepted to UAI is entitled “Probabilistic Collaborative Representation Learning for Personalized Item Recommendation” and describes a new Bayesian model for jointly modeling user preferences and deep item  features learning from auxiliary information (such as items’ textual descriptions, images, contexts, etc.).

 

The conference would not have been a full experience without its banquet dinner at the beautiful Monterey Marina Bay Aquarium. It was another opportunity to meet and interact with people in a broader sense. I particularly retain two things from this evening. The first one, of course, is the excellent food :). The second one, is a discussion with a group of researchers working on causal inference, which allowed me to realized the importance to look into this huge underexplored field in Machine Learning.

Monterey Aquarium during the banquet

 

UAI’18 was also an opportunity to discover Monterey. Aside from a long and rich history, the things that I enjoyed most about Monterey were the cool-summer weather, fresh & delicious seafood, and all these defunct sardine-canning factories turned into bars, restaurants or shops.

Cannery Row, the site of several now-defunct Sardine canning factories
The Monterey Canning Company, now transformed into a shopping center
Monterey Harbor Area
Montery Harbor Pier

 

IJCAI-18 in Stockholm

In July 2018, Hady traveled to Stockholm, Sweden for the International Joint Conference on Artificial Intelligence (IJCAI).  Here, he recounts his experience from the trip.

 

IJCAI-18 was probably the largest academic conference I ever participated in so far, with 2500 registered attendees.  This was 20% larger than the 2017 conference.  This pronounced growth and outsized congregation is one more sign of the rise (the return?) of Artificial Intelligence or AI.

The conference was held in the massive convention centre Stockholmsmässan

The conference organizer knew how to put up a show.  The scene-stealer during the opening ceremony was definitely the dancing couple of human and robot, gyrating harmoniously to the catchy beat.  Talk about what AI can do!

In the opening ceremony, we were treated to a spectacle of human-robot duet dance.
The conference has 710 papers, and a selective 21% acceptance rate. Singapore more than pulls its weight with 26 papers.

Our Preferred.AI group has 2 papers accepted to the conference.  Both were presented in the Learning Preferences or Rankings session.  The first paper “Modeling Contemporaneous Basket Sequences with Twin Networks for Next-Item Recommendation” by Trong, Hady, and Yuan explores the interaction between two behavioral streams that are occurring concurrently, such as clicking and purchasing on an e-commerce site, and how they can be modeled jointly to improve sequential recommendation.

 

The second paper “A Bayesian Latent Variable Model of User Preferences with Item Context” by Aghiles and Hady describes a novel graphical model based on Poisson factorization that incorporates item context information, such as which items are viewed together, in addition to  user-item interactions, to improve recommendations especially for users with more limited information.

DFN: Discordant Fraternal Network, a neural network for modeling the interaction between two sequence types of user actions
C2PF: Collaborative Context Poisson Factorization, a graphical model that incorporates item context for recommendation

In addition to the oral presentations in the session, we also got a chance to engage the audience in a poster session.  This allowed deep dives into specific issues, which would take more time and one-on-one discussions.  For instance, from a discussion with a researcher from a large online retailer, I learnt about how prevalent recurrent neural networks were in the company’s sequential recommendation models.

Hady with the two Preferred.AI posters on Contemporaneous Basket Sequences and Collaborative Context Poisson Factorization respectively

 

The two things that surprised me the most about Stockholm were the weather and the water.  While it was summer then, I was still struck by how a place so far north could be so warm.  No wonder the humidity is high, because water is everywhere!  I grew up with the notion that the neighboring  Indonesia was the largest archipelagic nation.  While that is probably still true in terms of area, little would I expect that the Stockholm archipelago actually have an even greater count of islands than Indonesia.

The view from the Stockholm City Hall, where one of the receptions was held
Ferry to Djurgården, heading to the social program at Skansen, the world’s oldest open-air museum

Prior to the trip, the thing I associated the most with Sweden was the Nobel Prize.  So a visit to Stockholm would not have been complete without experiencing the Nobel Museum.  While the museum has many artefacts connected to various prize winners over the years, the greatest find in my exploration was the cafe!  Some of the chairs have been signed by past prize winners. Well, I may not yet be able to say that I have stepped into their shoes, but now I could say that I once sat on their chairs :).

Nobel Museum, where I probably spent too much time in the museum shop 🙂
In the cafe, one of the chairs was suspended from the ceiling to highlight that the bottom of some chairs may have been signed by prize winners
Signatures of Barack Obama (Peace 2009) and Aung San Suu Kyi (Peace 1991) underneath one chair

KDD.SG Tutorial on Image Classification Using CNN (Materials)

Our appreciation to KDD.SG (Singapore Chapter of SIGKDD) and DSSG (DataScience Singapore) for jointly organizing and inviting us to deliver a public tutorial on June 9.  Tuan and Hady delivered a tutorial on image classification using convolutional neural networks, focusing on two applications, namely: face emotion recognition and visual sentiment analysis.

Hady opened the tutorial on Image Classification using CNN
Tuan explained the implementation of Multi Layer Perceptron on Tensorflow
The audience was actively involved and participating in the tutorial

For those who missed the tutorial, you may find the materials here for your own self-practice.  A video recording of the event can be found below.

 

SDSC – DSSG Data Science Meetup (Videos)

We are grateful to SDSC (Singapore Data Science Consortium and DSSG (DataScience Singapore) for organizing the June 7 meetup, and to the hundred attendees who gave us an opportunity to share some of our recent work.

After opening remarks by Caroline from SDSC, Hady gave the first technical talk titled “Modeling Preferences from Multi-Modal Data: A Deep Learning Exploration”.  The talk covered several works by our group in modeling user preferences from several modalities such as social networks, images, as well as sequences.

Focusing on modeling text, in particular about word embeddings and how sentiment infusion could improve the performance of word embeddings on several text classification tasks, Maksim gave a talk titled “SentiVec: Sentiment-Infused Word Embeddings”.

Rounding up the coverage of modalities, Aghiles gave a talk titled “C2PF: A Poisson Latent Factor Model of User Preferences with Item Context” on incorporating the effects of item context to improve recommendations.

You may find the slides of the three talks here.  It was a fruitful session, and we got a chance to meet many new contacts from academia and industry.  Looking forward to the next opportunity to get in touch.

Visual Sentiment Analysis

“A picture is worth a thousand words.”

So they say. Indeed, some images could capture certain moments so vividly that they become iconic and timeless.

Could a picture speak of the sentiment of the photographer?

Intuitively, that seems probable.  After all, in the choice of scenery or angle or other tricks up the sleeve of a photographer, the picture taken is essentially a rendering of what the photographer sees.

 

In pursuit of an empirical answer to what might have also been a philosophical question, we conduct research on a data set of images found within online reviews of restaurants crawled from Yelp.  With the advent of mobile phones, many online reviewers now prolifically include photos within their reviews, recounting their experiences as well as their sentiments textually and visually.

What is visual sentiment analysis?

To test the above hypothesis, we formulate a problem known as visual sentiment analysis.  Given an image, we seek to determine whether the image is positive (i.e., found within a review with rating of 4 or 5 on a scale of 5) or negative (i.e., associated with a rating of 1 or 2).  We build a binary classifier based on a deep learning framework called Convolutional Neural Networks.  Our model architecture shown below is reminiscent of AlexNet for object detection, with a twist in its application to binary sentiment classification.  We describe the details of this base model in a paper authored by Tuan and Hady and published in the ACM Multimedia Conference 2017.

VSCNN: Convolutional Neural Networks for Visual Sentiment Analysis

To cut a fascinating story short, we find that the trained visual sentiment analysis classifier performs significantly better than random, implying that indeed there are signals within an image that help to convey the overall sentiment of the review writer.

What do positive images look like?

Below we show some examples of images classified as positive.  Happy faces and celebrations seem to mark happy moments.  Note that this is a general image classification, and not specifically about facial emotion recognition (which itself is an interesting but distinct problem).  For another set of examples, if one can afford to dine at restaurants with a view, chances are the experience would be positive.

Examples of Images with Positive Sentiment

What do negative images look like?

Well, no one likes paying too much (or perhaps even paying at all?).  It is always a bummer to discover something that does not belong on one’s plate.

Examples of Images with Negative Sentiment

Interesting as it is, this is not yet the end of our exploration.  There are other factors that we would consider to improve the performance of visual sentiment analysis.  That’s the subject of a future blog post.  If you really can’t wait, here is the paper.