PhD Proposal: Data driven identification of malicious acts and actors on the Web

Talk
Srijan Kumar
Time: 
05.10.2016 11:30 to 13:00
Location: 

AVW 4172

The web enables transmission of knowledge at a speed and breadth unprecedented in human history, which has had tremendous positive impact on the lives of billions of people. While benign users try to keep the web safe and usable, malicious users add and spread harmful content, manipulate information and twist things in their favor. Having malicious users and their content questions the usefulness, credibility and safety of web platforms. Towards this end, my research focuses on developing general graph mining and user behavior modeling techniques to understand the behavior of both benign and malicious actors on the Web, and using it to identify malicious actors accurately. First, we will discuss an unsupervised decluttering algorithm that iteratively removes suspicious edges from the network, which trolls use to masquerade themselves as legitimate users. This algorithm is faster and has twice the accuracy compared to existing techniques. Second, we develop behavior-modeling techniques to identify malicious editors on Wikipedia, called vandals, who add unconstructive information. We build the first vandal early warning system that models the editing behavior patterns of benign editors and vandal editors, based on the relation between the edited pages, temporal dependency between edits, and other editing attributes. Finally, we combine both graph and behavior modeling techniques to develop the first system to identify hoax articles on Wikipedia – fabricated articles that are purposefully created to misguide others. By leveraging the content, hyperlink network and editor attributes, our system achieves super-human performance, with the added advantage is due to non-content features – meaning faking the content of articles is easy, while faking its importance on Wikipedia is not.
As future work, we propose to model complex behavior of both benign and malicious users from various perspectives. First, we try to understand how users use multiple accounts online for benign purpose (say, one account for each interest that one has) as well as for malicious purpose (say, for pushing bad content that would otherwise be deleted). We try to understand when and how these accounts are used and the interplay between them using real-world data from large-scale discussion platforms, and then use these insights to accurately tie multiple accounts together. Next, we model the behavior of users in signed social networks – networks with both positive (trust/friend) and negative (distrust/enemy) relations between users. Since these networks are intrinsically different from unsigned social networks, we first model various properties of signed networks and then develop metrics to identify fraudsters in such networks. We show that the metrics can be used to identify fraudsters in other product rating platforms too. Finally, we develop a crowd-sourced algorithm to identify malicious mobile apps, which exhibit malicious behavior temporarily and provide a benign functionality at other times. Building on the observation that many of these behaviors are user visible, we describe the design of a system that finds temporary malicious behaviors by mining user reviews from the leading Android marketplace.
Examining committee:
Chair: Dr. V.S. Subrahmanian
Dept rep: Dr. Mohammad Taghi Hajighayi
Member: Dr. Tudor Dumitras