ICPP 2020
ICPP 2020 Program (all times are EDT / GMT-4)

Tuesday, August 18th


Opening Remarks
Opening Remark




Best-Paper Candidates
Huffman Coding with Gap Arrays for GPU Acceleration
CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs
SkyChain: A Deep Reinforcement Learning-Empowered Dynamic Blockchain Sharding System
GOSH: Embedding Big Graphs on Small Hardware


Distributed Systems
CARD: A Congestion-Aware Request Dispatching Scheme for Replicated Metadata Server Cluster
Safe, Fast Sharing of memcached as a Protected Library
DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms

Edge Learning and Inference
ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference
FEEL: A Federated Edge Learning System for Efficient and Privacy-Preserving Mobile Healthcare
Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN

Memory Systems
An Efficient Wear-level Architecture using Self-adaptive Wear Leveling
CCHL: Compression-Consolidation Hardware Logging for Efficient Failure-Atomic Persistent Memory Updates
Balancing Fairness and Efficiency for Cache Sharing in Semi-external Memory System


Algorithm-Based Checkpoint-Recovery for the Conjugate Gradient Method
Robustness of the Young/Daly formula for stochastic iterative applications
Energy-aware strategies for reliability-oriented real-time task allocation on heterogeneous platforms

Scheduling and Placement in Networks
Cooperative Game for Multiple Chargers with Dynamic Network Topology
Optimizing Flow Bandwidth Consumption with Traffic-diminishing Middlebox Placement
Towards High-Efficiency Data Centers via Job-Aware Network Scheduling

Systems for Machine Learning
DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training
E-LAS: Design and Analysis of Completion-Time Agnostic Scheduling for Distributed Deep Learning Cluster
ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs

Wednesday, August 19th




Graph Processing and Concurrent Data Structures
Graffix: Efficient Graph Processing with a Tinge of GPU-Specific Approximations
Optimizing Linearizable Bulk Operations on Data Structures
GraBi: Communication-Efficient and Workload-Balanced Partitioning for Bipartite Graphs

Large-Scale Applications on Supercomputers
Large-scale Simulations of Peridynamics on Sunway Taihulight Supercomputer
Toward Large-Scale Image Segmentation on Summit
SWMapper: Scalable Read Mapper on SunWay TaihuLight

Machine Learning for Computing
An Online Learning-Based Task Offloading Framework for 5G Small Cell Networks
A Reinforcement Learning Based System for Minimizing Cloud Storage Service Cost
Deep Reinforcement Learning based Elasticity-compatible Heterogeneous Resource Management for Time-critical Computing


Performance Tools and Methodology
Generating Robust Parallel Programs via Model Driven Prediction of Compiler Optimizations for Non-determinism
Memory-Centric Communication Mechanism for Real-time Autonomous Navigation Applications
Automatic Identification and Precise Attribution of DRAM Bandwidth Contention

Storage Reliability & Memory Security
An Adaptive Erasure-Coded Storage Scheme with an Efficient Code-Switching Algorithm
First Time Miss : Low Overhead Mitigation For Shared Memory Cache Side Channels
A Rack-aware Pipeline Repair Scheme for Erasure-coded Distributed Storage Systems

Supporting Efficient Machine Learning
Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures
Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures
Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity


Data Center Networking
AMRT: Anti-ECN Marking to Improve Utilization of Receiver-driven Transmission in Data Center
PS : Periodic Strategy for the 40-100Gbps Energy Efficient Ethernet
Polo: Receiver-Driven Congestion Control for Low Latency over Commodity Network Fabric

Parallel Algorithms I
Prune the Unnecessary: Parallel Pull-Push Louvain Algorithms with Automatic Edge Pruning
Fast Spectral Graph Layout on Multicore Platforms
Revisiting Sparse Dynamic Programming for the 0/1 Knapsack Problem

Parallel and Distributed Machine Learning
Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks
Federated Learning with Proximal Stochastic Variance Reduced Gradient Algorithms
Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning

Thursday, August 20th




Heterogeneous Systems
Balancing Graph Processing Workloads Using Work Stealing on Heterogeneous CPU-FPGA Systems
Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors
Detecting Anomalous Computation with RNNs on GPU-Accelerated HPC Machines

Performance Evaluation and Characterization
Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver
SPECcast: A Methodology for Fast Performance Evaluation with SPEC CPU 2017 Multiprogrammed Workloads
The Art of CPU-Pinning: Evaluating and Improving the Performance of Virtualization and Containerization Platforms

Routing and Mapping in Networks
XShot: Light-weight Link Failure Localization using Crossed Probing Cycles in SDN
On Network Locality in MPI-Based HPC Applications
DeepHop on Edge: Hop-by-hop Routing by Distributed Learning with Semantic Attention


Microarchitecture and Power Management
A GPU Register File using Static Data Compression
HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems
DNNARA: A Deep Neural Network Accelerator using Residue Arithmetic and Integrated Photonics

Parallel Algorithms II
Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs
Efficient Block Algorithms for Parallel Sparse Triangular Solve
Selective Coflow Completion for Time-sensitive Distributed Applications with Poco

Resource Management on the Cloud
Improving Load Balance via Resource Exchange in Large-Scale Search Engines
Rendering Server Allocation for MMORPG Players in Cloud Gaming
Impact of Memory DoS Attacks on Cloud Applications and Real-Time Detection Schemes


GPU-Accelerated Applications
Parallel Shift-Invert Spectrum Slicing on Distributed Architectures with GPU Accelerators
Detailed Analysis and Optimization of CUDA K-means Algorithm
Performance Portable Supernode-based Sparse Triangular Solver for Manycore Architectures


Data Centers and the Edge
OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment
Reducing Latency in Multi-Tenant Data Centers via Cautious Congestion Watch
URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds
Reliability Augmentation of Requests with Service Function Chain Requirements in Mobile Edge-Cloud Networks

Storage and I/O Optimization
OPS: Optimized Shuffle Management System for Apache Spark
SeRW: Adaptively Separating Read and Write upon SSDs of Hybrid Storage Server in Clouds
Scalable Coordination of Hierarchical Parallelism
Mass: Workload-Aware Storage Policy for OpenStack Swift

Created 2020-6-25 8:56