My research focuses on the intersection of data mining (data analytics), high performance computing (HPC), and network science. I am particularly interested in mining and analysis of big social and information networks by designing parallel algorithms and HPC techniques.
Network (graph) is a powerful abstraction for representing interactions among entities in a system. The entities and their interactions are represented as nodes (vertices) and links (edges) of a network, respectively. Examples include various social networks, biological networks, the web graph, and collaboration networks of authors. Mining and analyzing networks and reasoning about them through modeling facilitate to understand and improve corresponding systems. Due to the advancement of computing and data technology, we are deluged with massive data from diverge areas such as business and finance, social media, biology, and other data driven disciplines. In the era of big data, the emerging network data is also very large. Social networks such as Facebook and Twitter have millions to billions of users. The World Wide Web has over a trillion web pages. Such massive networks motivates for efficient and scalable algorithms for mining and analysis. My research strives to design such algorithms with applications to social, biological, and other technical systems.
Complex systems are organized in clusters or communities, each having distinct role or function. In the corresponding network representation, each functional unit (community) appears as a tightly-knit set of nodes having a higher connection inside the set than outside. Finding communities may reveal the organization of complex systems and their function. We are currently working on designing parallel scalable algorithms for detecting communities in large-scale networks.
Characterizing real-world social and information networks based on graph-theoretic metrics or properties has been of growing interest. Among the most explored metrics are degree distribution, number of triangles and clustering coefficients. An important property related to triangles, of many networks, is high transitivity, which states that two nodes (vertices) having common neighbor(s) have an elevated probability of being neighbors to one another. We present a characterization of networks based on a quantification of common neighbors.
We are working to design scalable algorithmic and analytic techniques to study PPI networks. Our study of PPIs will be based on network-centric mining and analysis approaches. We will design specialized methods for extracting signed motifs, computing centrality, and finding functional units in PPI networks.
I was a member of CINET project team during my PhD years. This NSF-funded project, titled as "From Desktops to Clouds -- A Middleware for Next Generation Network Science," is a large collaborative research effort. By harnessing new cloud-based resources in an easily accessible manner, network science researchers will be able to deal more complex problems. We have built a cyber infrastructure which is designed to be self-sustainable.
My role: I worked on designing and implementing highly efficient and scalable algorithms for various problem of network science. The implemented modules serve as computational engine behind the whole system.