About Me

I am a Ph.D. candidate at McGill University, School of Information Studies since 2014. I am fortunate to be supervised by Dr. Benjamin C. M. Fung and work with him as well as other excellent colleagues in the Data Mining and Security Lab (DMaS). My research bridges the area of data mining, machine learning, and security. Under this umbrella, I design novel data mining and machine learning models and develop them as practical software applications that address critical issues in reverse engineering and cybercrime investigation. I am particularly interested in seeking theoretical advancements that can resolve practical challenges to augment human’s cognitive power in different scenarios. My research has attracted industrial interests and has an ongoing close collaboration with Defence Research and Development Canada (DRDC).


Research Experience


Selected Research Topics

image

A basic block clone pair (x86 vs. ARM).

Assembly Clone Search. Assembly clone search greatly reduces the manual effort of reverse engineering since it can identify the cloned parts that have been previously analyzed. By closely collaborating with reverse engineers, I studied the challenges, designed and implemented an award-winning clone search engine called Kam1n0. It also includes specialized techniques that can mitigate the variance introduced by different processor families, different compilers, optimization techniques, and binary protection techniques. Kam1n0 has been presented at the Smart Cybersecurity Network Canada (SERENE-RISC), SOPHOS, ESET, Above Security, and Google.



image

Predicted dynamic behaviors using our neural network.

Neural Malware Analysis. Malware behavioral indicators denote those potentially high-risk malicious behaviors exhibited, such as unintended network communications, file encryption, keystroke logging, sandbox evasion, and camera manipulation. Generally, they are generated using sandboxes or simulators. However, the complexity of modern malware has been considerably increased. Malware is becoming sandbox-aware by incorporating modern evasive techniques. To address these issues, I propose a new neural network-based static scanner that can characterize the malicious behaviors of a given executable, without running it in a sandbox. It can be used as an additional binary analytic layer to mitigate the issues of polymorphism, metamorphism, and evasive techniques.



image

Where does the binary come from?

Binary Provenance Analysis. Binary provenance denotes the characteristics of a program that derives from its path from source code to executable form. Binary provenance is important in the domain of binary forensic and performance analysis. It provides important evidential trial for cybersecurity investigators to track down the hackers behind the security accidence. For example, the Lazarus group is linked to the Wannacry incidence by code similarity. I mainly focus on two critical aspects: toolchain recovery and authorship analysis.



image

Visualized writeprint for a given candidate.

Authorship Analysis and Writeprint Anonymization. The internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Given the anonymous documents, authorship analysis is the study of identifying the actual author and his/her socio-linguistic characteristics. Many linguistic stylometric features and computational techniques have been extensively studied for this purpose. However, most of them emphasize promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. I opt for an interpretable and explainable approach by which writing styles can be visualized, compared, and interpreted by an investigator like fingerprints. I also propose to integrate differential privacy and reinforcement learning to paraphrase text where writing style is sanitized.



Media Coverage

2018 - Why the AI-powered Kam1n0 is a breakthrough in global cybersecurity

”… the world’s first artificial intelligence-powered search engine for assembly code, a tool with great potential to improve cybersecurity worldwide.” – Arts News, Faculty of Art, McGill University

2016 - Five Technological Innovations You Didn’t Know Came From Montréal

”… a scalable data mining system for a range of computer security-related uses. What you need to know: this makes a computer programmer’s life a lot easier and could improve cyber security in the coming years.” – Tourism Montreal

2015 - Visualizing E-mail Writeprint 2015

“The result, Author Miner 3.0, presented the complex information about stylometric features in colour-coded charts that could be easily and accurately interpreted.” – McGill Headway