I am a Ph.D. candidate at McGill University, School of Information Studies since 2014. I am fortunate to be supervised by Dr. Benjamin C. M. Fung and work with him as well as other excellent colleagues in the Data Mining and Security Lab (DMaS). My research bridges the area of data mining, machine learning, and security. Under this umbrella, I design novel data mining and machine learning models and develop them as practical software applications that address critical issues in reverse engineering and cybercrime investigation. I am particularly interested in seeking theoretical advancements that can resolve practical challenges to augment human’s cognitive power in different scenarios. My research has attracted industrial interests and has an ongoing close collaboration with Defence Research and Development Canada (DRDC).
- Research Internship, EBTIC Research Centre established by British Telecommunications (BT) and Khalifa University, United Arab Emirates
- Research Assistant, Data Mining and Security Lab (DMaS), McGill University, Montreal
- Research Consultant, National Cyber-Forensics and Training Alliance Canada, Montreal
- Research Assistant, Concordia Institute for Information Systems Engineering, Concordia University, Montreal
- Research Assistant, Laboratory of Network Engineering, University of Shanghai for Science and Technology, Shanghai.
- Research Assistant, Laboratory of Software Engineering, University of Shanghai for Science and Technology, Shanghai.
Selected Research Topics
Assembly Clone Search. Assembly clone search greatly reduces the manual effort of reverse engineering since it can identify the cloned parts that have been previously analyzed. By closely collaborating with reverse engineers, I studied the challenges, designed and implemented an award-winning clone search engine called Kam1n0. It also includes specialized techniques that can mitigate the variance introduced by different processor families, different compilers, optimization techniques, and binary protection techniques. Kam1n0 has been presented at the Smart Cybersecurity Network Canada (SERENE-RISC), SOPHOS, ESET, Above Security, and Google.
Neural Malware Analysis. Malware behavioral indicators denote those potentially high-risk malicious behaviors exhibited, such as unintended network communications, file encryption, keystroke logging, sandbox evasion, and camera manipulation. Generally, they are generated using sandboxes or simulators. However, the complexity of modern malware has been considerably increased. Malware is becoming sandbox-aware by incorporating modern evasive techniques. To address these issues, I propose a new neural network-based static scanner that can characterize the malicious behaviors of a given executable, without running it in a sandbox. It can be used as an additional binary analytic layer to mitigate the issues of polymorphism, metamorphism, and evasive techniques.
Binary Provenance Analysis. Binary provenance denotes the characteristics of a program that derives from its path from source code to executable form. Binary provenance is important in the domain of binary forensic and performance analysis. It provides important evidential trial for cybersecurity investigators to track down the hackers behind the security accidence. For example, the Lazarus group is linked to the Wannacry incidence by code similarity. I mainly focus on two critical aspects: toolchain recovery and authorship analysis.
Authorship Analysis and Writeprint Anonymization. The internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Given the anonymous documents, authorship analysis is the study of identifying the actual author and his/her socio-linguistic characteristics. Many linguistic stylometric features and computational techniques have been extensively studied for this purpose. However, most of them emphasize promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. I opt for an interpretable and explainable approach by which writing styles can be visualized, compared, and interpreted by an investigator like fingerprints. I also propose to integrate differential privacy and reinforcement learning to paraphrase text where writing style is sanitized.
2018 - Why the AI-powered Kam1n0 is a breakthrough in global cybersecurity
”… the world’s first artificial intelligence-powered search engine for assembly code, a tool with great potential to improve cybersecurity worldwide.” – Arts News, Faculty of Art, McGill University
2016 - Five Technological Innovations You Didn’t Know Came From Montréal
”… a scalable data mining system for a range of computer security-related uses. What you need to know: this makes a computer programmer’s life a lot easier and could improve cyber security in the coming years.” – Tourism Montreal
2015 - Visualizing E-mail Writeprint 2015
“The result, Author Miner 3.0, presented the complex information about stylometric features in colour-coded charts that could be easily and accurately interpreted.” – McGill Headway