Shaikh M. Arifuzzaman: Publications

Publications

Find here links to my Google Scholar and DBLP profiles.

Journals and Conference Proceedings:

Scalable distributed Louvain algorithm for community detection in large graphs

Naw Safrin Sattar and Shaikh Arifuzzaman

Journal of Supercomputing (JSC), Springer, Jan 2022.

[PDF] [Springer]

Keywords: graph algorithm, community detection, louvain algorithm, large-scale computing

HyPC-Map: A Hybrid Parallel Community Detection Algorithm Using Information-Theoretic Approach

Md Abdul Motaleb Faysal, Shaikh Arifuzzaman, Cy Chan, Maximilian Bremer, Doru Popovici, John Shalf

In 2021 IEEE International Conference on High Performance Extreme Computing (IEEE HPEC 2021), IEEE, Sept 2021.

[PDF] [IEEE HPEC]

Keywords: graph algorithm, community detection, scalable algorithms, large-scale computing

COVID-19 Vaccination Awareness and Aftermath: Public Sentiment Analysis on Twitter Data and Vaccinated Population Prediction in the USA

Naw Safrin Sattar and Shaikh Arifuzzaman

Applied Sciences, Volume 11, No 6128, MDPI, Jun 2021.

[PDF] [Applied Sciences]

Keywords: social network analysis, sentiment analysis, Twitter data mining, large-scale computing, predictive analysis, machine learning

Community Detection using Semisupervised Learning with Graph Convolutional Network on GPUs

Naw Safrin Sattar and Shaikh Arifuzzaman

In 2020 IEEE International Conference on BigData (BigData 2020), IEEE, Dec 2020.

[PDF] [BigData2020]

Keywords: machine learning, graph convolutional network, scalable algorithms, large-scale learning

Data Parallel Large Sparse Deep Neural Network on GPU

Naw Safrin Sattar and Shaikh Arifuzzaman

In ScaDL 2020 Workshop (Scalable Deep Learning over Parallel And Distributed Infrastructures ), with 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS 2020), IEEE, May 2020.

[PDF] [IPDPS20] [ScaDL]

Keywords: machine learning, deep learning, GPU, data parallelism, scalable algorithms, large-scale learning

Fast Parallel Algorithms for Counting and Listing Triangles in Big Graphs

Shaikh Arifuzzaman, Maleq Khan, and Madhav Marathe

ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 14 Num. 1, Pages 5:1--5:34, ACM, 2020.

[PDF] [ACM DL]

Keywords: Parallel algorithms, approximate algorithm, complexity analysis, scalability analysis, time and space efficient algorithms, triangle counting, graph data mining, large-scale graph data

Bibtex:
@article{tkdd-ArifuzzamanKM20,
author = {Shaikh Arifuzzaman and Maleq Khan and Madhav Marathe},
title = {Fast Parallel Algorithms for Counting and Listing Triangles in Big Graphs},
journal = {{ACM} Trans. Knowl. Discov. Data (TKDD)},
volume = {14},
number = {1},
pages = {5:1--5:34},
year = {2019},
url = {https://doi.org/10.1145/3365676},
doi = {10.1145/3365676},
issue_date = {February 2020},
publisher = {{ACM}}
}

Overcoming MPI Communication Overhead for Distributed Community Detection

Naw Safrin Sattar and Shaikh Arifuzzaman

Software Challenges to Exascale Computing, Communications in Computer and Information Science , Vol 964, Page 77-90, Springer, 2019.

[PDF] [BibTex] [Springer]

Keywords: Parallel algorithms, distributed-memory algorithms, Message Passing Interface (MPI), communication overhead, community detection, Louvain algorithm, graph (network) data mining, large-scale graph data

Bibtex:
@inproceedings{exa-SattarA19,
author = {Sattar, Naw Safrin and Arifuzzaman, Shaikh},
title = {Overcoming MPI Communication Overhead for Distributed Community Detection},
journal = {Software Challenges to Exascale Computing, Communications in Computer and Information Science},
volume = {964},
number = {1},
pages = {77--90},
year = {2019},
url = {https://doi.org/10.1007/978-981-13-7729-7_6},
publisher = {Springer}
}

Fast Stochastic Block Partitioning using a Single Commodity Machine

Md Abdul Motaleb Faysal and Shaikh Arifuzzaman

In proc. of 2019 IEEE International Conference on BigData (BigData 2019), pages 3632--3639, IEEE, December 2019.

[IEEEXplore]

Keywords: Parallel algorithms, stochastic block partitioning, scalability analysis, community detection, graph data mining, large-scale graph data

Bibtex:
@inproceedings{bigdata-FaysalA19,
author = {Md Abdul Motaleb Faysal and Shaikh Arifuzzaman},
title = {Fast Stochastic Block Partitioning using a Single Commodity Machine},
booktitle = {In proc. of 2019 {IEEE} International Conference on Big Data (Big Data)},
pages = {3632--3639},
year = {2019},
month = {December},
url = {https://doi.org/10.1109/BigData47090.2019.9006246},
publisher = {{IEEE}}
}

Distributed Community Detection in Large Networks using An Information-Theoretic Approach

Md Abdul Motaleb Faysal and Shaikh Arifuzzaman

In proc. of 2019 IEEE International Conference on BigData (BigData 2019), pages 4773--4782, IEEE, December 2019.

[IEEEXplore]

Keywords: Parallel algorithms, complexity analysis, scalability analysis, community detection, graph data mining, large-scale graph data

Bibtex:
@inproceedings{bigdata-FaysalA19a,
author = {Md Abdul Motaleb Faysal and Shaikh Arifuzzaman},
title = {Distributed Community Detection in Large Networks using An Information-Theoretic Approach},
booktitle = {In proc. of 2019 {IEEE} International Conference on Big Data (Big Data)},
pages = {4773--4782},
year = {2019},
month = {December},
url = {https://doi.org/10.1109/BigData47090.2019.9005562},
publisher = {{IEEE}}
}

Detecting Web Spams in Webgraphs with Predictive Model Analysis

Naw Safrin Sattar, Shaikh Arifuzzaman, Minhaz F. Zibran, and Md Mohiuddin Sakib

In proc. of 2019 IEEE International Conference on BigData (BigData 2019), pages 4299--4308, IEEE, December 2019.

[PDF] [IEEEXplore]

Keywords: graph data mining, large-scale graph data, machine learning, spam detection, predictive analysis

Bibtex:
@inproceedings{bigdata-SattarAZS19,
author = {Naw Safrin Sattar and Shaikh Arifuzzaman and Minhaz F. Zibran and Md Mohiuddin Sakib},
title = {Detecting Web Spam in Webgraphs with Predictive Model Analysis},
booktitle = {In proc. of 2019 {IEEE} International Conference on Big Data (Big Data)},
pages = {4299--4308},
year = {2019},
month = {December},
url = {https://doi.org/10.1109/BigData47090.2019.9006282},
publisher = {{IEEE}}
}

Understanding Performance Bottleneck to Improve Parallel Efficiency of Louvain Algorithm

Naw Safrin Sattar and Shaikh Arifuzzaman

In 4th International Parallel Data Systems Workshop (PDSW19 Workshop) with Supercomputing (SC19) Conference, IEEE/ACM, Nov 2019.

Keywords: Parallel algorithms, complexity analysis, scalability analysis, performance analysis, community detection, graph data mining, large-scale graph data

Assessing the Dependability of Apache Spark System: Streaming Analytics on Large-scale Ocean Data

Janak Dahal, Elias Ioup, Shaikh Arifuzzaman, and Mahdi Abdelguerfi

In proc. of the 5th International Conference on Dependability in Sensor, Cloud, and Big Data Systems and Applications (DependSys 2019) , pages 131--144, Springer, November, 2019.

[PDF] [Springer Nature]

Keywords: large-scale analysis, Apache Spark, Ocean data, Streaming analytics

Bibtex:
@inproceedings{dependsys-DahalIAA19,
author = {Janak Dahal and Elias Ioup and Shaikh Arifuzzaman and Mahdi Abdelguerfi},
title = {Assessing the Dependability of Apache Spark System: Streaming Analytics on Large-scale Ocean Data},
booktitle = {In proc. of the 5th International Conference on Dependability in Sensor, Cloud, and Big Data Systems and Applications (DependSys 2019)},
pages = {131--144},
year = {2019},
month = {November},
url = {https://doi.org/10.1007/978-981-15-1304-6\_11},
publisher = {{Springer}}
}

-->

On the Assessment of Security and Performance Bugs in Chromium Open-source Project

Joseph Imseis, Costain Nachuma, Shaikh Arifuzzaman, Minhaz Zibran, and Zakirul Alam Bhuiyan

The 5th International Conference on Dependability in Sensor, Cloud, and Big Data Systems and Applications (DependSys 2019) , Guangzhou, China, November 12-15, 2019.

[PDF] [Springer Nature]

Keywords: data analysis, open-source software project, security bugs, performance bugs

Scalable Mining, Analysis, and Visualization of Protein-protein Interaction Networks

Shaikh Arifuzzaman and Bikesh Pandey

International Journal of Big Data Intelligence (IJBDI), Inderscience Publishers, Volume 6, N 3/4, Pages 176-187, 2019.

[PDF] [BibTex] [IJBDI]

Keywords: Scalable tool, parallel algorithms, protein interaction network, graph (network) analysis, graph visualization, large-scale datasets, scalable computing

Bibtex:
@inproceedings{ijbdi-ArifuzzamanP19,
author = {Shaikh Arifuzzaman and Bikesh Pandey},
title = {Scalable Mining, Analysis, and Visualization of Protein-protein Interaction Networks},
booktitle = {International Journal of Big Data Intelligence (IJBDI)},
pages = {176--187},
volume = {6},
number = {3/4},
year = {2019},
url = {https://doi.org/10.1504/IJBDI.2019.100884},
publisher = {Inderscience}
}

A Comparative Analysis of Large-scale Network Visualization Tools

Md AM Faysal and Shaikh Arifuzzaman

In Proceeding of 2018 IEEE International Conferene on BigData (BigData 2018), pages 4837--4843, Seattle, WA, USA, IEEE, Dec 2018

[PDF] [BibTex] [IEEEXplore]

Keywords: Big data visualizaiton, graph (network) analysis, graph visualization, large-scale datasets, scalable computing

Bibtex:
@inproceedings{bigdata-FaysalA18,
author = {Md AM Faysal and Shaikh Arifuzzaman},
title = {A Comparative Analysis of Large-scale Network Visualization Tools},
booktitle = {2018 IEEE International Conferene on BigData (BigData 2018)},
pages = {4837--4843},
year = {2018},
month = {December},
url = {https://doi.org/10.1109/BigData.2018.8622001},
publisher = {{IEEE}}
}

Parallel Algorithms for Mining Large-scale Time-varying (Dynamic) Graphs

Shaikh Arifuzzaman, Naw Safrin Sattar, and Md AM Faysal

In PDSW-DISCS Workshop in SC'18, Dallas, TX, USA, Nov 2018

[Abstract] [Slides] [PDSW-DISCS]

Keywords: Parallel algorithms, dynamic graph, temporal graph data, graph (network) analysis, graph visualization, large-scale datasets, scalable computing

Parallelizing Louvain Algorithm: Distributed Memory Challenges

Naw Safrin Sattar and Shaikh Arifuzzaman

In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing (DASC), page 695-701, Athens, Greece, Aug 2018.

[PDF] [BibTex]

Louvain algorithm is a well-known and efficient method for detecting communities or clusters in social and information networks (graphs). The emergence of large network data necessitates parallelization of this algorithms for high performance computing platforms. There exist several shared-memory based parallel algorithms for Louvain method. However, those algorithms do not scale to a large number of cores and large networks. Distributed memory systems are widely available nowadays, which offer a large number of processing nodes. However, the existing only MPI (message passing interface) based distributed-memory parallel implementation of Louvain algorithm has shown scalability to only 16 processors. In this paper, we implement both shared- and distributed-memory based parallel algorithms and identify issues that hinder scalability. In our shared-memory based algorithm using OpenMP, we get 4-fold speedup for several real-world networks. However, this speedup is limited only by the physical cores available to our system. We then design a distributed-memory based parallel algorithms using message passing interface. Our results demonstrate an scalability to a moderate number of processors. We also provide an empirical analysis that shows how communication overhead poses the most crucial threat for deisgning scalable parallel Louvain algorithm in a distributed-memory setting.

Bibtex:
@inproceedings{arifuzzaman-louv18,
author = {Sattar, Naw Safrin and Arifuzzaman, Shaikh},
title = {Parallelizing Louvain Algorithm: Distributed Memory Challenges},
booktitle = {Proceedings of 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing (DASC), Athens, Greece},
pages = {695--701},
year = {2018},
month= {Aug},
publisher = {{IEEE}}
}

Data Mining in-IDE Activities: Why Software Developers Fail

Naw Safrin Sattar, Md AM Faysal, Minhaz Zibran, Shaikh Arifuzzaman and Md Rakibul Islam

In proceeding of the 27th International Conference on Software Engineering and Data Engineering (SEDE), pp. 97 - 102, New Orleans, LA, USA, 2018.

[PDF]

This paper addresses the hindrances behind software developers' failures as perceived from software build failures. We capture and correlate the routines and patterns of developers' interactions in IDE, their backgrounds, expertise, and geographic locations with their failure instances. Our study is based on a large dataset of 85 developers' 11 million interactions/events in Microsoft Visual Studio IDE over 15,000 work-hours. The findings from this study will help developers and organizations in shaping their working style for higher success rate.

Scalable Mining and Analysis of Protein-Protein Interaction Networks

Shaikh Arifuzzaman and Bikesh Pandey

In Proceeding of the 3rd IEEE International Conference on Big Data Intelligence and Computing (DataCom), Pages 1098--1105, Orlando, FL, Nov 2017.

[PDF] [BibTex] [IEEEXplore]

Protein-protein interaction (PPI) networks are the networks of protein complexes formed by biochemical events and electrostatic forces. PPI networks can be used to study diseases and discover drugs. The causes of diseases are evident on a protein interaction level. For instance, an elevation of interaction edge weights of oncogenes is manifested in cancers. Further, the majority of approved drugs target a particular PPI, and thus studying PPI networks is vital to drug discovery. The availability of large datasets and need for efficient analysis necessitate the design of scalable methods leveraging modern high-performance computing (HPC) platforms. In this paper, we design a lighweight framework on a distributed-memory parallel system, which includes scalable algorithmic and analytic techniques to study PPI networks and visualize them. Our study of PPIs will be based on network-centric mining and analysis approaches. Since PPI networks are signed (labeled) and weighted, many existing network mining methods working on simple unweighted networks will be unsuitable to study PPIs. Further, the large volume and variety of such data limits the use of sequential tool or methods. Many existing tools also do not support a convenient workflow starting from automated data preprocessing to visualizing results and reports for efficient extraction of intelligence from large-scale PPI networks. Our framework support automated analytics based on a large range of extensible methods for extracting signed motifs, computing centrality, and finding functional units. We design MPI (Message Passing Interface) based parallel methods and workflow, which scale to large networks. The framework is also extensible and sufficiently generic. To the best of our knowledge, all these capabilities collectively make our tool novel.

Bibtex:
@inproceedings{arifuzzaman-ppi-networks,
author = {Arifuzzaman, S. and Pandey, Bikesh},
title = {Scalable Mining and Analysis of Protein-Protein Interaction Networks},
booktitle = {Proceedings of the 3rd IEEE International Conference on Big Data Intelligence and Computing (DataCom), Orlando, FL, USA},
pages = {1098-1105},
year = {2017},
month= {November},
publisher = {{IEEE}}
}

A Space-efficient Parallel Algorithm for Counting Exact Triangles in Massive Networks

Shaikh Arifuzzaman, Maleq Khan, and Madhav V. Marathe

In Proc. of the 17th IEEE International Conference on High Performance Computing and Communications (HPCC), pages 527–534, 2015.

[PDF] [BibTex] [IEEEXplore]

In this paper, we present a space-efficient MPI based parallel algorithm for counting exact number of triangles in massive networks. The algorithm divides the network into nonoverlapping partitions. Our results demonstrate up to 25-fold space saving over the algorithm with overlapping partitions. This space efficiency allows the algorithm to deal with networks which are 25 times larger. We present a novel approach that reduces communication cost drastically (up to 90%) leading to both a space- and runtime-efficient algorithm. Our adaptation of a parallel partitioning scheme by computing a novel weight function adds further to the efficiency of the algorithm.

Bibtex:
@inproceedings{arifuzzaman-space-triangle,
author = {Arifuzzaman, S. and Khan, Maleq and Marathe, Madhav},
title = {A Space-efficient Parallel Algorithm for Counting Exact Triangles in Massive Networks},
booktitle = {Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), New York City, USA},
pages = {527--534},
year = {2015},
month= {August},
publisher = {{IEEE}}
}

A Fast Parallel Algorithm for Counting Triangles in Graphs using Dynamic Load Balancing

Shaikh Arifuzzaman, Maleq Khan and Madhav V. Marathe

In proc. of 2015 IEEE International Conference on Big Data (BigData), pages 1839–1847, 2015.

[PDF] [ACMDL]

In this paper, we present an efficient MPI-based parallel algorithm for counting triangles in large graph. We consider the case where the main memory of each compute node is large enough to contain the entire graph. We observe that for such a case, computation load can be balanced dynamically and present a dynamic load balancing scheme which improves the performance of the algorithm significantly. Our algorithm demonstrates very good speedups and scales to a large number of processors. The algorithm computes the exact number of triangles in a network with 1 billion edges in 2 minutes with only 100 processors. Our results demonstrate that the algorithm is significantly faster than the related algorithms with static partitioning. In fact, for the real-world networks we experimented on, our algorithm achieves at least 2 times runtime efficiency over the fastest algorithm with static load balancing.

Bibtex:
@inproceedings{arifuzzaman_triangle_dynamic,
author = {Arifuzzaman, S. and Khan, M and Marathe, M.},
title = {A Fast Parallel Algorithm for Counting Triangles in Graphs using Dynamic Load Balancing},
booktitle = {Proceedings of 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA},
pages = {1839--1847},
year = {2015},
month= {October},
location = {CA, USA},
publisher = {{IEEE}}
}

A Fast Parallel Conversion of Edge List to Adjacency List for Large-scale Graphs

Shaikh Arifuzzaman and Maleq Khan

In proc. of the 23rd High Performance Computing Symposium (HPC), pages 17–24, 2015.

[PDF] [ACMDL]

In this paper, we present efficient MPI-based distributed memory parallel algorithms for converting edge lists to adjacency lists. To the best of our knowledge, this is the first work on this problem. To address the critical load balancing issue, we present a parallel load balancing scheme which improves both time and space efficiency significantly. Our fast parallel algorithm works on massive graphs, achieves very good speedups, and scales to large number of processors. The algorithm can convert an edge list of a graph with 20 billion edges to the adjacency list in less than 2 minutes using 1024 processors. Denoting the number of nodes, edges and processors by n, m, and P, respectively, the time complexity of our algorithm is O(m/p + n + P) which provides a speedup factor of at least Ω(min{P, d_avg}), where davg is the average degree of the nodes. The algorithm has a space complexity of O(m/p), which is optimal.

Bibtex:
@inproceedings{graph_conversion,
author = {Arifuzzaman, S. and Khan, M.},
title = {Fast Parallel Conversion of Edge List to Adjacency List for Large-scale Graphs},
booktitle = {Proceedings of the 23rd High Performance Computing Symposium (HPC 2015), Alexandria, VA, USA},
pages = {17--24},
year = {2015},
month= {April},
location = {Alexandria, VA, USA}
}

CINET 2.0: A Cyber-Infrastructure for Network Science

Sherif Abdelhamid, Maksudul Alam, Richardo Alo, Shaikh Arifuzzaman, Peter Beckman, et al.

In proc. of the 10th IEEE International Conference on eScience (eScience), pages 324–331, 2014.

[PDF] [IEEEXplore]

Analysis of structural properties and dynamics of networks is currently a central topic in many disciplines including Social Sciences, Biology and Business. CINET, a cyberinfrastructure for such studies, introduced the concept of supporting network analysis as a service. The basic idea is to allow experts in various disciplines to focus on obtaining domain-specific insights from the results of network analyses instead of worrying about programming details and allocation of computational resources needed to carry out the analyses. A basic version of CINET was released in May 2012. This paper discusses CINET 2.0, a significantly enhanced version that supports complex network analyses through a web portal. CINET 2.0 has already been used for teaching courses related to Network Science at several US universities. In this paper, we discuss how CINET 2.0 significantly extends CINET 1.0 through enhancements to some components and the addition of new components.

Bibtex:
@inproceedings{cinet2_2014,
author = {Sherif Hanie El Meligy Abdelhamid and Md. Maksudul Alam and Richard Al{\'{o}} and Shaikh Arifuzzaman and Peter H. Beckman and Tirtha Bhattacharjee and Md Hasanuzzaman Bhuiyan and Keith R. Bisset and Stephen Eubank and Albert C. Esterline and Edward A. Fox and Geoffrey Fox and S. M. Shamimul Hasan and Harshal Hayatnagarkar and Maleq Khan and Chris J. Kuhlman and Madhav V. Marathe and Natarajan Meghanathan and Henning S. Mortveit and Judy Qiu and S. S. Ravi and Zalia Shams and Ongard Sirisaengtaksin and Samarth Swarup and Anil Kumar S. Vullikanti and Tak{-}Lon Wu},
title = {{CINET} 2.0: {A} CyberInfrastructure for Network Science},
booktitle = {Proceedings of the 10th {IEEE} International Conference on e-Science (e-Science 2014), Sao Paulo, Brazil},
pages = {324--331},
year = {2014},
month = {October},
publisher = {{IEEE}}
}

PATRIC: A Parallel Algorithm for Counting Triangles in Massive Networks

Shaikh Arifuzzaman, and Maleq Khan and Madhav V. Marathe

In proc. of the 22nd ACM International Conference on Information and Knowledge Management (CIKM), pages 529–538, 2013.

[PDF] [ACMDL]

In this paper, we present an efficient MPI-based distributed memory parallel algorithm, called PATRIC, for counting triangles in massive networks. PATRIC scales well to networks with billions of nodes and can compute the exact number of triangles in a network with one billion nodes and 10 billion edges in 16 minutes. Balancing computational loads among processors for a graph problem like counting triangles is a challenging issue. We present and analyze several schemes for balancing load among processors for the triangle counting problem. These schemes achieve very good load balancing. We also show how our parallel algorithm can adapt an existing edge sparsification technique to approximate the number of triangles with very high accuracy. This modification allows us to count triangles in even larger networks.

Bibtex:
@inproceedings{arifuzzaman_triangle_cikm13,
author = {Shaikh Arifuzzaman and Maleq Khan and Madhav V. Marathe},
title = {{PATRIC:} a parallel algorithm for counting triangles in massive networks},
booktitle = {Proceedings of the 22nd {ACM} International Conference on Information and Knowledge Management (CIKM 2013), San Francisco, CA, USA},
pages = {529--538},
year = {2013},
month ={October},
publisher = {{ACM}}
}

CINET: A Cyber-Infrastructure for Network Science

Sherif Abdelhamid, Richardo Alo, Shaikh Arifuzzaman, Peter Beckman, et al.

In proc. of the 8th IEEE International Conference on eScience (eScience), pages 1–8, 2012.

[PDF] [IEEEXplore]

Networks are an effective abstraction for representing real systems. Consequently, network science is increasingly used in academia and industry to solve problems in many fields. Computations that determine structure properties and dynamical behaviors of networks are useful because they give insights into the characteristics of real systems. We introduce a newly built and deployed cyberinfrastructure for network science (CINET) that performs such computations, with the following features: (i) it offers realistic networks from the literature and various random and deterministic network generators; (ii) it provides many algorithmic modules and measures to study and characterize networks; (iii) it is designed for efficient execution of complex algorithms on distributed high performance computers so that they scale to large networks; and (iv) it is hosted with web interfaces so that those without direct access to high performance computing resources and those who are not computing experts can still reap the system benefits. It is a combination of application design and cyberinfrastructure that makes these features possible. To our knowledge, these capabilities collectively make CINET novel. We describe the system and illustrative use cases, with a focus on the CINET user.

Bibtex:
@inproceedings{cinet_2012,
author = {Sherif Elmeligy Abdelhamid and Richard Al{\'{o}} and S. M. Arifuzzaman and Peter H. Beckman and Md Hasanuzzaman Bhuiyan and Keith R. Bisset and Edward A. Fox and Geoffrey Charles Fox and Kevin Hall and S. M. Shamimul Hasan and Anurodh Joshi and Maleq Khan and Chris J. Kuhlman and Spencer J. Lee and Jonathan Leidig and Hemanth Makkapati and Madhav V. Marathe and Henning S. Mortveit and Judy Qiu and S. S. Ravi and Zalia Shams and Ongard Sirisaengtaksin and Rajesh Subbiah and Samarth Swarup and Nick Trebon and Anil Vullikanti and Zhao Zhao},
title = {{CINET:} {A} cyberinfrastructure for network science},
booktitle = {Proceedings of the 8th {IEEE} International Conference on e-Science (e-Science 2012), Chicago, IL, USA},
pages = {1--8},
year = {2012},
month = {October},
publisher = {{IEEE}}
}

Book Chapter, Book/Dissertation

Parallel Mining and Analysis of Triangles and Communities in Big Networks

Shaikh Arifuzzaman

Ph.D. Dissertation, Dept. of Computer Science, Virginia Tech, August 2016.

[PDF] [VTechWorks]

We design MPI-based distributed-memory parallel algorithms for counting triangles and detecting communities in big networks and present related analysis. The dissertation consists of four parts. In Part I, we devise parallel algorithms for counting and enumerating triangles. The first algorithm employs an overlapping partitioning scheme and novel load-balancing schemes leading to a fast algorithm. We also design a space-efficient algorithm using non-overlapping partitioning and an efficient communication scheme. We then present our third parallel algorithm based on dynamic load balancing. In Part II, we characterize networks by quantifying the number of common neighbors and demonstrate its relationship to community structure of networks. In Part III, we design parallel algorithms for detecting communities in big networks. Finally, in Part IV, we present scalable parallel algorithms for a useful graph preprocessing problem-- converting edge list to adjacency list. We present non-trivial parallelization with efficient HPC-based techniques leading to fast and space-efficient algorithms.

Bibtex:
@phdthesis{Arif2016Triangle,
title = {Parallel Mining and Analysis of Triangles and Communities in Big Networks},
school = {Dept. of Computer Science, Virginia Tech},
author = {Shaikh Arifuzzaman},
year = {2016},
month = {August}
}

Distributed Memory Parallel Algorithms for Massive Graphs

Maksudul Alam, Shaikh Arifuzzaman, Hasanuzzaman Bhuiyan, Maleq Khan, VS Anil Kumar, and Madhav Marathe

Book Chapter in Parallel Graph Algorithms. Ed. David Bader, Chapman & Hall/ CRC Computational Science, 2015. | ISBN-10: 1466573260

Graph theoretic problems are representative of fundamental kernels in traditional and emerging scientific applications, such as complex network analysis, data mining, and computational biology, as well as applications in national security. Graph abstractions are also extensively used to understand and solve challenging problems in scientific computing. Real-world systems, such as the Internet, telephone networks, social interactions, and transportation networks, are analyzed by modeling them as graphs. To efficiently solve large-scale graph problems, it is necessary to design high performance computing systems and novel parallel algorithms. In this book, some of the world’s most renowned experts explore the latest research and applications in this important area.

Posters (peer-reviewed)

Parallel Algorithms for Scalable Graph Mining: Applications on Big Data and Machine Learning

Naw Safrin Sattar and Shaikh Arifuzzaman

Doctoral Showcase, SC'21 (SuperComputing) Conference, the International Conference for High Performance Computing, Networking, Storage, and Analysis in St. Louis, MO, USA, November 2021

[PDF]

Keywords: Parallel computing, commmunity detection algorithm, Louvain algorithm, scalable machine learning

Parallel Algorithm and Analysis for Understanding Evolving Community Structures in Temporal Graphs

Naw Safrin Sattar and Shaikh Arifuzzaman, Khaled Ibrahim, Aydin Buluc

SIAM Conference on Computational Science and Engineering (CSE21), Fort Worth, Texas, USA, February, 2021

[PDF]

Keywords: Parallel computing, commmunity detection algorithm, Louvain algorithm, scalable machine learning

Preventing Epidemic Spreading by Community Detection with Semi-supervised Node Classification on GPU

Naw Safrin Sattar and Shaikh Arifuzzaman

Grace Hopper Celebration (GHC-2020), Orlando, FL, USA, October 2020.

Keywords: Parallel computing, commmunity detection algorithm, Louvain algorithm, scalable machine learning

Community Detection with Graph Convolutional Networks using Semi-supervised Node Classificatio

Naw Safrin Sattar and Shaikh Arifuzzaman

14th Women in Machine Learning Workshop (WiML 2019), Vancouver, British Columbia, Canada, December 2019

Keywords: Parallel computing, commmunity detection algorithm, Louvain algorithm, scalable machine learning

A Comparative Analysis of Parallel Louvain Algorithms for Community Detection

Naw Safrin Sattar and Shaikh Arifuzzaman

SIAM Conference on Computational Science (SIAM CSE’19), Minisymposterium, February 25 – March 1, Spokane, Washington, 2019.

[PDF]

Keywords: Parallel computing, commmunity detection algorithm, Louvain algorithm

Scalable Community Detection Using Parallel Louvain Algorithms

Naw Safrin Sattar and Shaikh Arifuzzaman

CRA-W Grad Cohort for Women 2019, Chicago, IL, April 12-13, 2019.

Keywords: Parallel computing, commmunity detection algorithm, Louvain algorithm

A Comparative Analysis of Parallel Louvain Algorithms for Community Detection

Naw Safrin Sattar and Shaikh Arifuzzaman

6th Annual Texas STEM Conference, Lamar University, Beaumont, Texas, November 3, 2018.

[PDF]

Keywords: Parallel computing, commmunity detection algorithm, Louvain algorithm

Parallel Algorithms for Counting Triangles and Computing Clustering Coefficients

Shaikh Arifuzzaman, Maleq Khan, and Madhav Marathe

In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (SC'12), pages 1448–1449, 2012.

[PDF] [IEEEXplore]

We present MPI-based parallel algorithms for counting triangles and computing clustering coefficients in massive networks. Counting triangles is important in the analysis of various networks, e.g., social, biological, web etc. Emerging massive networks do not fit in the main memory of a single machine and are very challenging to work with. Our distributed-memory parallel algorithm allows us to deal with such massive networks in a time- and space-efficient manner. We were able to count triangles in a graph with 2 billions of nodes and 50 billions of edges in 10 minutes. Our parallel algorithm for computing clustering coefficients uses efficient external memory aggregation. We also show how edge sparsification technique can be used with our parallel algorithm to find approximate number of triangles without sacrificing the accuracy of estimation. In addition, we propose a simple modification of a state-of-the-art sequential algorithm that improves both runtime and space requirement.

Bibtex:
@inproceedings{extabst_sc12,
author = {S. M. Arifuzzaman and Maleq Khan and Madhav V. Marathe},
title = {Abstract: Parallel Algorithms for Counting Triangles and Computing Clustering Coefficients},
booktitle = {2012 {SC} Companion: High Performance Computing, Networking Storage and Analysis, Salt Lake City, UT, USA},
pages = {1448--1449},
year = {2012},
month = {November}
}

Tech Report, Preprint, etc.

Machine Learning for Detecting Web Spams in Webgraphs

Naw Safrin Sattar, Shaikh Arifuzzaman, Minhaz Zibran, Md Mohiuddin Sakib and Md Kauser Ahmmed

Preprint. 2019.

Keywords: Machine learning, graph mining, web graphs, spam detection

Distributed Streaming Analytics on Large-scale Oceanographic Data using Apache Spark

Janak Dahal, Elias Ioup, Shaikh Arifuzzaman and Mahdi Abdelguerfi

Preprint. 2019.

Keywords: streaming analytics, apache spark, ocean data, large-scale computing

Distributed-Memory Parallel Algorithms for Counting and Listing Triangles in Big Graphs

Shaikh Arifuzzaman, Maleq Khan, and Madhav V. Marathe

CoRR abs/1706.05151, pages 1-30, 2017.

[PDF]

In this paper, we present two efficient MPI-based distributed memory parallel algorithms for counting triangles in big graphs. The first algorithm employs overlapping partitioning and efficient load balancing schemes to provide a very fast parallel algorithm. The algorithm scales well to networks with billions of nodes and can compute the exact number of triangles in a network with 10 billion edges in 16 minutes. The second algorithm divides the network into non-overlapping partitions leading to a space-efficient algorithm. Our results on both artificial and real-world networks demonstrate a significant space saving with this algorithm. We also present a novel approach that reduces communication cost drastically leading the algorithm to both a space- and runtime-efficient algorithm. Further, we demonstrate how our algorithms can be used to list all triangles in a graph and compute clustering coefficients of nodes. Our algorithm can also be adapted to a parallel approximation algorithm using an edge sparsification method.

Bibtex:
@article{Arif2017Triangle,
title = {Distributed-Memory Parallel Algorithms for Counting and Listing Triangles in Big Graphs},
author = {Shaikh Arifuzzaman and Maleq Khan and Madhav Marathe},
journal = {CoRR},
volume = {1706},
issue = {05151},
pages = {1--30},
year = {2017}
}

Generating a Synthetic Population of the United States

Abhijin Adiga, Aditya Agashe, Shaikh Arifuzzaman , Christopher L Barrett, Richard J Beckman, Keith R Bisset, Jiangzhuo Chen, Youngyun Chungbaek, Stephen G Eubank, Sandeep Gupta, Maleq Khan, Christopher J Kuhlman, Eric Lofgren, Bryan L Lewis, Achla Marathe, Madhav V Marathe, Henning S Mortveit, Eric Nordberg

Technical Report NDSSL 15-009. Network Dynamics and Simulation Science Laboratory

[PDF]

We describe the methodology for generating a synthetic population of the United States. A synthetic population integrates a variety of databases from commercial and public sources into a common architecture for data exchange. The process preserves the confidentiality of the individuals in the original data sets, yet produces realistic attributes and demographics for the synthetic individuals. The synthetic population is a set of synthetic people and households, located geographically, each associated with demographic variables recorded in the census. Joint demographic distributions are reconstructed from the marginal distributions available in typical census data using an iterative proportional fitting (IPF) technique. Each synthetic individual is placed in a household with other synthetic individuals. Each household is located geographically using land-use data and data pertaining to transportation networks. The process guarantees that a census of our synthetic population is statistically indistinguishable from the original census.

Parallel Algorithms for Counting Triangles in Networks with Large Degrees

Shaikh Arifuzzaman, Maleq Khan, and Madhav V. Marathe

CoRR abs/1406.5687, pages 1-10, 2014.

[PDF]

Bibtex:
@article{Arifuzzaman_triangle_14,
author = {Shaikh Arifuzzaman and Maleq Khan and Madhav V. Marathe},
title = {Parallel Algorithms for Counting Triangles in Networks with Large Degrees},
journal = {CoRR},
volume = {abs/1406.5687},
pages = {1--10},
year = {2014}
}