I lead the Big Data and Scalable Computing Research Group at UNO. My research focuses on the intersection of data mining (data analytics), high performance computing (HPC), and network science. I am particularly interested in mining and analysis of big social and information networks by designing parallel algorithms and HPC techniques.
Network (graph) is a powerful abstraction for representing interactions among entities in a system. Examples include various social networks, biological networks, the web graph, and collaboration networks of authors. Mining and analyzing networks and reasoning about them through modeling facilitate to understand and improve corresponding systems. Due to the advancement of computing and data technology, we are deluged with massive data from diverge areas such as business and finance, social media, biology, and other data driven disciplines. In the era of big data, the emerging network data is also very large. Such massive networks motivates for efficient and scalable algorithms for mining and analysis. My research strives to design such algorithms with applications to social, biological, and other technical systems.
Complex systems are organized in clusters or communities, each having distinct role or function. In the corresponding network representation, each functional unit (community) appears as a tightly-knit set of nodes having a higher connection inside the set than outside. Finding communities may reveal the organization of complex systems and their function. We are currently working on designing parallel scalable algorithms for detecting communities in large-scale networks.
In this project, we identify several popular network visualization tools and provide a comparative analysis based on the features and operations these tools support. We demonstrate empirically how those tools scale to large networks. We also provide several case studies of visual analytics on large network data and assess performances of the tools.
Characterizing real-world social and information networks based on graph-theoretic metrics or properties has been of growing interest. Among the most explored metrics are degree distribution, number of triangles and clustering coefficients. An important property related to triangles, of many networks, is high transitivity, which states that two nodes (vertices) having common neighbor(s) have an elevated probability of being neighbors to one another. We present a characterization of networks based on a quantification of common neighbors.
We are working to design scalable algorithmic and analytic techniques to study PPI networks. Our study of PPIs will be based on network-centric mining and analysis approaches. We will design specialized methods for extracting signed motifs, computing centrality, and finding functional units in PPI networks.
I was a member of CINET project team during my PhD years. This NSF-funded project, titled as "From Desktops to Clouds -- A Middleware for Next Generation Network Science," is a large collaborative research effort. By harnessing new cloud-based resources in an easily accessible manner, network science researchers will be able to deal more complex problems. We have built a cyber infrastructure which is designed to be self-sustainable.
My role: I worked on designing and implementing highly efficient and scalable algorithms for various problem of network science. The implemented modules serve as computational engine behind the whole system.
Currently, my lab consists of 2 PhD students and several undergraduate students.
The students are working on various problems on large-scale data mining, parallel computing, graph (networks) mining and visualization.