ICPP 2020 Program (all times are EDT / GMT-4)

A

Aananthakrishnan, Sriram · more

Prune the Unnecessary: Parallel Pull-Push Louvain Algorithms with Automatic Edge Pruning · view

Abad, Pablo · more

SPECcast: A Methodology for Fast Performance Evaluation with SPEC CPU 2017 Multiprogrammed Workloads · view

Abdelrahman, Tarek · more

Balancing Graph Processing Workloads Using Work Stealing on Heterogeneous CPU-FPGA Systems · view

Agostini, Matthew · more

Balancing Graph Processing Workloads Using Work Stealing on Heterogeneous CPU-FPGA Systems · view

Akella, Venkatesh · more

HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems · view

Akyildiz, Taha Atahan · more

GOSH: Embedding Big Graphs on Small Hardware · view

Alabsi Aljundi, Amro · more

GOSH: Embedding Big Graphs on Small Hardware · view

Alibhai, Shakeel · more

A Rack-aware Pipeline Repair Scheme for Erasure-coded Distributed Storage Systems · view

Alkabani, Yousra · more

DNNARA: A Deep Neural Network Accelerator using Residue Arithmetic and Integrated Photonics · view

Amarasinghe, Saman · more

How to Make Sparse Fast · view

Amini Salehi, Mohsen · more

The Art of CPU-Pinning: Evaluating and Improving the Performance of Virtualization and Containerization Platforms · view

Angerd, Alexandra · more

A GPU Register File using Static Data Compression · view

Return to Top

B

Bacik, Josef · more

The Art of CPU-Pinning: Evaluating and Improving the Performance of Virtualization and Containerization Platforms · view

Bao, Wei · more

Federated Learning with Proximal Stochastic Variance Reduced Gradient Algorithms · view

Barker, Kevin · more

Detecting Anomalous Computation with RNNs on GPU-Accelerated HPC Machines · view

Bensaou, Brahim · more

Reducing Latency in Multi-Tenant Data Centers via Cautious Congestion Watch · view

Return to Top

C

Cai, Shangming · more

CARD: A Congestion-Aware Request Dispatching Scheme for Replicated Metadata Server Cluster · view

Cai, Wentong · more

Rendering Server Allocation for MMORPG Players in Cloud Gaming · view

Cai, Xiaoqing · more

OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment · view

Cai, Zhiping · more

FEEL: A Federated Edge Learning System for Efficient and Privacy-Preserving Mobile Healthcare · view

Cao, Qiang · more

GraBi: Communication-Efficient and Workload-Balanced Partitioning for Bipartite Graphs · view
SeRW: Adaptively Separating Read and Write upon SSDs of Hybrid Storage Server in Clouds · view

Castro, Fernando · more

Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors · view

Chai, Qifei · more

Balancing Fairness and Efficiency for Cache Sharing in Semi-external Memory System · view

Chau, Sid Chi-Kin · more

Reliability Augmentation of Requests with Service Function Chain Requirements in Mobile Edge-Cloud Networks · view

Chen, Huaming · more

Saec: Similarity-Aware Embedding Compression in Recommendation Systems · view

Chen, Li · more

E-LAS: Design and Analysis of Completion-Time Agnostic Scheduling for Distributed Deep Learning Cluster · view
FEEL: A Federated Edge Learning System for Efficient and Privacy-Preserving Mobile Healthcare · view

Chen, Mengqiang · more

Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning · view

Chen, Quan · more

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds · view
OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment · view

Chen, Wuhui · more

SkyChain: A Deep Reinforcement Learning-Empowered Dynamic Blockchain Sharding System · view

Chen, Yang · more

Optimizing Flow Bandwidth Consumption with Traffic-diminishing Middlebox Placement · view

Chen, Yu · more

Mass: Workload-Aware Storage Policy for OpenStack Swift · view

Chen, Zheng · more

ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs · view

Cheng, Yuchen · more

OPS: Optimized Shuffle Management System for Apache Spark · view

Chuah, Mooi Choo · more

Impact of Memory DoS Attacks on Cloud Applications and Real-Time Detection Schemes · view

Chung, Jae-Won · more

ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference · view

Curtis-Maury, Matthew · more

Scalable Coordination of Hierarchical Parallelism · view

Return to Top

D

Deng, Fan · more

SeRW: Adaptively Separating Read and Write upon SSDs of Hybrid Storage Server in Clouds · view

Deng, Jing · more

Cooperative Game for Multiple Chargers with Dynamic Network Topology · view

Denninnart, Chavit · more

The Art of CPU-Pinning: Evaluating and Improving the Performance of Virtualization and Containerization Platforms · view

Devadas, Vinay · more

Scalable Coordination of Hierarchical Parallelism · view

Dinh, Canh T. · more

Federated Learning with Proximal Stochastic Variance Reduced Gradient Algorithms · view

Dong, Yuanyuan · more

SeRW: Adaptively Separating Read and Write upon SSDs of Hybrid Storage Server in Clouds · view

Du, Xiaoyong · more

ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs · view
CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs · view

Du, Yishu · more

Robustness of the Young/Daly formula for stochastic iterative applications · view

Duan, Kaiyue · more

Improving Load Balance via Resource Exchange in Large-Scale Search Engines · view

Duan, Xiaohui · more

SWMapper: Scalable Read Mapper on SunWay TaihuLight · view

Return to Top

E

El-Ghazawi, Tarek · more

DNNARA: A Deep Neural Network Accelerator using Residue Arithmetic and Integrated Photonics · view

Ellingwood, Nathan · more

Performance Portable Supernode-based Sparse Triangular Solver for Manycore Architectures · view

Return to Top

F

Fan, Pingzhi · more

Selective Coflow Completion for Time-sensitive Distributed Applications with Poco · view

Farrens, Matthew · more

HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems · view

Feng, Dan · more

Mass: Workload-Aware Storage Policy for OpenStack Swift · view
CCHL: Compression-Consolidation Hardware Logging for Efficient Failure-Atomic Persistent Memory Updates · view

Fröning, Holger · more

On Network Locality in MPI-Based HPC Applications · view

Return to Top

G

Gansterer, Wilfried N. · more

Algorithm-Based Checkpoint-Recovery for the Conjugate Gradient Method · view

Gao, Hongyun · more

XShot: Light-weight Link Failure Localization using Crossed Probing Cycles in SDN · view

Gao, Yiqin · more

Energy-aware strategies for reliability-oriented real-time task allocation on heterogeneous platforms · view

Gavrilovska, Ada · more

Generating Robust Parallel Programs via Model Driven Prediction of Compiler Optimizations for Non-determinism · view

Ge, Rong · more

Detecting Anomalous Computation with RNNs on GPU-Accelerated HPC Machines · view

Ghatrehsamani, Davood · more

The Art of CPU-Pinning: Evaluating and Improving the Performance of Virtualization and Containerization Platforms · view

Gómez Flores, Wilfrido · more

Towards Parallelization of a Texture Description Algorithm for Breast Lesion Classification using OpenMP and CUDA · view

Gong, Ruihao · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

Gong, Xiaoli · more

DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms · view

Gong, Yifan · more

EPMA: Efficient Partial Message Access in IoT Era · view
Memory-Centric Communication Mechanism for Real-time Autonomous Navigation Applications · view

Gregorio, Jose Angel · more

SPECcast: A Methodology for Fast Performance Evaluation with SPEC CPU 2017 Multiprogrammed Workloads · view

Guo, Minyi · more

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds · view
OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment · view

Guo, Song · more

SkyChain: A Deep Reinforcement Learning-Empowered Dynamic Blockchain Sharding System · view

Guo, Yeting · more

FEEL: A Federated Edge Learning System for Efficient and Privacy-Preserving Mobile Healthcare · view

Return to Top

H

H. Tran, Nguyen · more

Federated Learning with Proximal Stochastic Variance Reduced Gradient Algorithms · view

Han, Li · more

Energy-aware strategies for reliability-oriented real-time task allocation on heterogeneous platforms · view

Han, Qingchang · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

He, Bingsheng · more

CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs · view

He, Bo · more

DeepHop on Edge: Hop-by-hop Routing by Distributed Learning with Semantic Attention · view

He, Ligang · more

Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks · view

He, Tian · more

AMRT: Anti-ECN Marking to Improve Utilization of Receiver-driven Transmission in Data Center · view

He, Xubin · more

A Rack-aware Pipeline Repair Scheme for Erasure-coded Distributed Storage Systems · view

Hedayati, Mohammad · more

Safe, Fast Sharing of memcached as a Protected Library · view

Helm, Christian · more

Automatic Identification and Precise Attribution of DRAM Bandwidth Contention · view

Herrero, Jose Angel · more

SPECcast: A Methodology for Fast Performance Evaluation with SPEC CPU 2017 Multiprogrammed Workloads · view

Hinkle, Jacob · more

Toward Large-Scale Image Segmentation on Summit · view

Hong, Zicong · more

SkyChain: A Deep Reinforcement Learning-Empowered Dynamic Blockchain Sharding System · view

Hovland, Paul · more

Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures · view

Hu, Jinbin · more

AMRT: Anti-ECN Marking to Improve Utilization of Receiver-driven Transmission in Data Center · view

Hu, Peng · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

Hu, Yongmin · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

Hu, Zhenbo · more

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity · view

Hua, Yu · more

An Efficient Wear-level Architecture using Self-adaptive Wear Leveling · view

Huang, Fangting · more

An Efficient Wear-level Architecture using Self-adaptive Wear Leveling · view

Huang, Jianming · more

An Efficient Wear-level Architecture using Self-adaptive Wear Leveling · view

Huang, Jiawei · more

AMRT: Anti-ECN Marking to Improve Utilization of Receiver-driven Transmission in Data Center · view

Hueckelheim, Jan · more

Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures · view

Return to Top

I

Inaba, Yoko · more

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs · view

Ito, Yasuaki · more

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs · view
Huffman Coding with Gap Arrays for GPU Acceleration · view

Return to Top

J

Jaya, Iryanto · more

Rendering Server Allocation for MMORPG Players in Cloud Gaming · view

Ji, Bo · more

Optimizing Flow Bandwidth Consumption with Traffic-diminishing Middlebox Placement · view

Jia, Xiaohua · more

Reliability Augmentation of Requests with Service Function Chain Requirements in Mobile Edge-Cloud Networks · view

Jiang, Hong · more

GraBi: Communication-Efficient and Workload-Balanced Partitioning for Bipartite Graphs · view

Jiang, Wanchun · more

PS : Periodic Strategy for the 40-100Gbps Energy Efficient Ethernet · view
Polo: Receiver-Driven Congestion Control for Low Latency over Commodity Network Fabric · view

Jiang, Zhang · more

DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms · view

Jiang, Ziyue · more

EPMA: Efficient Partial Message Access in IoT Era · view

Jin, Jiangming · more

EPMA: Efficient Partial Message Access in IoT Era · view
Memory-Centric Communication Mechanism for Real-time Autonomous Navigation Applications · view

Jin, Sian · more

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity · view

Return to Top

K

Kasagi, Akihiko · more

Huffman Coding with Gap Arrays for GPU Acceleration · view

Katsuki, Ryota · more

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs · view

Kaya, Kamer · more

GOSH: Embedding Big Graphs on Small Hardware · view

Kim, Jae-Yun · more

ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference · view

Kirmani, Shad · more

Fast Spectral Graph Layout on Multicore Platforms · view

Kjellqvist, Chris · more

Safe, Fast Sharing of memcached as a Protected Library · view

Kratochvíl, Miroslav · more

Detailed Analysis and Optimization of CUDA K-means Algorithm · view

Kruliš, Martin · more

Detailed Analysis and Optimization of CUDA K-means Algorithm · view

Return to Top

L

Leng, Jingwen · more

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds · view
OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment · view

Levonyak, Markus · more

Algorithm-Based Checkpoint-Recovery for the Conjugate Gradient Method · view

Li, Ang · more

Detecting Anomalous Computation with RNNs on GPU-Accelerated HPC Machines · view

Li, Chao · more

OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment · view

Li, Junyu · more

Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks · view

Li, Keqiu · more

XShot: Light-weight Link Failure Localization using Crossed Probing Cycles in SDN · view

Li, Xin · more

SWMapper: Scalable Read Mapper on SunWay TaihuLight · view

Li, Xinyuan · more

Large-scale Simulations of Peridynamics on Sunway Taihulight Supercomputer · view

Li, Yusen · more

Improving Load Balance via Resource Exchange in Large-Scale Search Engines · view
Balancing Fairness and Efficiency for Cache Sharing in Semi-external Memory System · view
Rendering Server Allocation for MMORPG Players in Cloud Gaming · view

Li, Zhaoyi · more

AMRT: Anti-ECN Marking to Improve Utilization of Receiver-driven Transmission in Data Center · view

Li, Zhuozhao · more

Impact of Memory DoS Attacks on Cloud Applications and Real-Time Detection Schemes · view

Li, Zongpeng · more

An Online Learning-Based Task Offloading Framework for 5G Small Cell Networks · view

Liang, Weifa · more

Reliability Augmentation of Requests with Service Function Chain Requirements in Mobile Edge-Cloud Networks · view

Liao, Jianxin · more

DeepHop on Edge: Hop-by-hop Routing by Distributed Learning with Semantic Attention · view

Liao, kaiqin · more

PS : Periodic Strategy for the 40-100Gbps Energy Efficient Ethernet · view

Lim, Seung-Hwan · more

Toward Large-Scale Image Segmentation on Summit · view

Lin, Chi · more

Cooperative Game for Multiple Chargers with Dynamic Network Topology · view

Lin, Jieyu · more

Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN · view

Liu, Bing · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

Liu, Chang · more

OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment · view

Liu, Cong · more

DeepHop on Edge: Hop-by-hop Routing by Distributed Learning with Semantic Attention · view

Liu, Fang · more

FEEL: A Federated Edge Learning System for Efficient and Privacy-Preserving Mobile Healthcare · view

Liu, Jing · more

Energy-aware strategies for reliability-oriented real-time task allocation on heterogeneous platforms · view

Liu, Jingning · more

CCHL: Compression-Consolidation Hardware Logging for Efficient Failure-Atomic Persistent Memory Updates · view

Liu, Qi · more

A Reinforcement Learning Based System for Minimizing Cloud Storage Service Cost · view

Liu, Shuyang · more

SeRW: Adaptively Separating Read and Write upon SSDs of Hybrid Storage Server in Clouds · view

Liu, Tong · more

A Rack-aware Pipeline Repair Scheme for Erasure-coded Distributed Storage Systems · view

Liu, Wei · more

EPMA: Efficient Partial Message Access in IoT Era · view
Memory-Centric Communication Mechanism for Real-time Autonomous Navigation Applications · view

Liu, Weifeng · more

CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs · view
Efficient Block Algorithms for Parallel Sparse Triangular Solve · view

Liu, Weiguo · more

SWMapper: Scalable Read Mapper on SunWay TaihuLight · view

Liu, Xiaoguang · more

Improving Load Balance via Resource Exchange in Large-Scale Search Engines · view

Liu, Ximing · more

DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms · view

Liu, Yang · more

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity · view

Liu, Yanqiang · more

OPS: Optimized Shuffle Management System for Apache Spark · view

Liu, Zixia · more

Deep Reinforcement Learning based Elasticity-compatible Heterogeneous Resource Management for Time-critical Computing · view

Llort, German · more

Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver · view

Lowe-Power, Jason · more

HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems · view

Lu, Youyou · more

DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training · view

Lu, Zhengyang · more

Efficient Block Algorithms for Parallel Sparse Triangular Solve · view

Luan, Zhongzhi · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

Lui, John C.S. · more

An Online Learning-Based Task Offloading Framework for 5G Small Cell Networks · view

Lunga, Dalton · more

Toward Large-Scale Image Segmentation on Summit · view

Luo, Qiong · more

DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training · view

Luo, Shouxi · more

Selective Coflow Completion for Time-sensitive Distributed Applications with Poco · view

Return to Top

M

M. Abdelmoniem, Ahmed · more

Reducing Latency in Multi-Tenant Data Centers via Cautious Congestion Watch · view

Ma, Tao · more

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds · view

Ma, Yu · more

Reliability Augmentation of Requests with Service Function Chain Requirements in Mobile Edge-Cloud Networks · view

Madduri, Kamesh · more

Fast Spectral Graph Layout on Multicore Platforms · view

Mao, Rui · more

Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks · view

Marbach, Trent · more

Improving Load Balance via Resource Exchange in Large-Scale Search Engines · view

Marchal, Loris · more

Robustness of the Young/Daly formula for stochastic iterative applications · view

McCamant, Stephen · more

First Time Miss : Low Overhead Mitigation For Shared Memory Cache Side Channels · view

Meneses Viveros, Amilcar · more

Towards Parallelization of a Texture Description Algorithm for Breast Lesion Classification using OpenMP and CUDA · view

Meng, Xiangxu · more

SWMapper: Scalable Read Mapper on SunWay TaihuLight · view

Mercadal, Estanislao · more

Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver · view

Mishra, Ashirbad · more

Fast Spectral Graph Layout on Multicore Platforms · view

Moon, Soo-Mook · more

ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference · view

Munera, Adrian · more

Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver · view

Mururu, Girish · more

Generating Robust Parallel Programs via Model Driven Prediction of Compiler Optimizations for Non-determinism · view

Return to Top

N

Nakano, Koji · more

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs · view
Huffman Coding with Gap Arrays for GPU Acceleration · view

Narayanan, Sri Hari Krishna · more

Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures · view

Nasre, Rupesh · more

Graffix: Efficient Graph Processing with a Tinge of GPU-Specific Approximations · view

Nguyen, Tuan Dung · more

Federated Learning with Proximal Stochastic Variance Reduced Gradient Algorithms · view

Nie, Lihai · more

XShot: Light-weight Link Failure Localization using Crossed Probing Cycles in SDN · view

Nitta, Christopher · more

HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems · view

Niu, Yuyao · more

Efficient Block Algorithms for Parallel Sparse Triangular Solve · view

Return to Top

O

O'Brien, Francis · more

Balancing Graph Processing Workloads Using Work Stealing on Heterogeneous CPU-FPGA Systems · view

Return to Top

P

Pachajoa, Carlos · more

Algorithm-Based Checkpoint-Recovery for the Conjugate Gradient Method · view

Pacher, Christina · more

Algorithm-Based Checkpoint-Recovery for the Conjugate Gradient Method · view

Pal, Lisa · more

Towards Parallelization of a Texture Description Algorithm for Breast Lesion Classification using OpenMP and CUDA · view

Pallez, Guillaume · more

Robustness of the Young/Daly formula for stochastic iterative applications · view

Pande, Santosh · more

Generating Robust Parallel Programs via Model Driven Prediction of Compiler Optimizations for Non-determinism · view

Peng, Jiaxin · more

DNNARA: A Deep Neural Network Accelerator using Residue Arithmetic and Integrated Photonics · view

Petrini, Fabrizio · more

Prune the Unnecessary: Parallel Pull-Push Louvain Algorithms with Automatic Edge Pruning · view

Prajapati, Nirmal · more

Revisiting Sparse Dynamic Programming for the 0/1 Knapsack Problem · view

Prieto, Pablo · more

SPECcast: A Methodology for Fast Performance Evaluation with SPEC CPU 2017 Multiprogrammed Workloads · view

Prieto-Matias, Manuel · more

Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors · view

Puente, Valentin · more

SPECcast: A Methodology for Fast Performance Evaluation with SPEC CPU 2017 Multiprogrammed Workloads · view

Return to Top

Q

Qi, Qi · more

DeepHop on Edge: Hop-by-hop Routing by Distributed Learning with Semantic Attention · view

Qi, Zhengwei · more

OPS: Optimized Shuffle Management System for Apache Spark · view

Qian, Depei · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

Qiu, Xiaoyu · more

SkyChain: A Deep Reinforcement Learning-Empowered Dynamic Blockchain Sharding System · view

Quan, Gang · more

Deep Reinforcement Learning based Elasticity-compatible Heterogeneous Resource Management for Time-critical Computing · view

Quiñones, Eduardo · more

Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver · view

Return to Top

R

Rajamanickam, Sivasankaran · more

Performance Portable Supernode-based Sparse Triangular Solver for Manycore Architectures · view

Rajopadhye, Sanjay · more

Revisiting Sparse Dynamic Programming for the 0/1 Knapsack Problem · view

Ramkrishnan, Kartik · more

First Time Miss : Low Overhead Mitigation For Shared Memory Cache Side Channels · view

Ravichandran, Kaushik · more

Generating Robust Parallel Programs via Model Driven Prediction of Compiler Optimizations for Non-determinism · view

Ren, Rui · more

OPS: Optimized Shuffle Management System for Apache Spark · view

Ren, Shenyuan · more

Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks · view

Robert, Yves · more

Energy-aware strategies for reliability-oriented real-time task allocation on heterogeneous platforms · view
Robustness of the Young/Daly formula for stochastic iterative applications · view

Rodriguez, Matthew A. · more

Optimizing Linearizable Bulk Operations on Data Structures · view

Royuela, Sara · more

Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver · view

Ruan, Chang · more

Polo: Receiver-Driven Congestion Control for Low Latency over Commodity Network Fabric · view

Return to Top

S

Saez, Juan Carlos · more

Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors · view

Schanen, Michel · more

Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures · view

Schmidt, Bertil · more

SWMapper: Scalable Read Mapper on SunWay TaihuLight · view

Schulte, Michael · more

Challenges and Opportunities for Extreme-Scale Computing · view

Scott, Michael L. · more

Safe, Fast Sharing of memcached as a Protected Library · view

Seal, Sudip · more

Toward Large-Scale Image Segmentation on Summit · view

Sen, Tanmoy · more

Impact of Memory DoS Attacks on Cloud Applications and Real-Time Detection Schemes · view

Shao, Airan · more

An Adaptive Erasure-Coded Storage Scheme with an Efficient Code-Switching Algorithm · view

Shen, Haiying · more

A Reinforcement Learning Based System for Minimizing Cloud Storage Service Cost · view
Impact of Memory DoS Attacks on Cloud Applications and Real-Time Detection Schemes · view

Sheng, Feng · more

GraBi: Communication-Efficient and Workload-Balanced Partitioning for Bipartite Graphs · view

Shi, Jiuchen · more

OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment · view

Shi, Yang · more

Towards High-Efficiency Data Centers via Job-Aware Network Scheduling · view

Sifat, Tarequl Islam · more

Revisiting Sparse Dynamic Programming for the 0/1 Knapsack Problem · view

Singh, Somesh · more

Graffix: Efficient Graph Processing with a Tinge of GPU-Specific Approximations · view

Sintorn, Erik · more

A GPU Register File using Static Data Compression · view

Song, Zhuo · more

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds · view

Sorger, Volker · more

DNNARA: A Deep Neural Network Accelerator using Residue Arithmetic and Integrated Photonics · view

Spear, Michael F. · more

Optimizing Linearizable Bulk Operations on Data Structures · view

Stasiak, Andrzej · more

Prune the Unnecessary: Parallel Pull-Push Louvain Algorithms with Automatic Edge Pruning · view

Stenström, Per · more

A GPU Register File using Static Data Compression · view

Straube, Kramer · more

HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems · view

Su, Jiya · more

CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs · view

Sultana, Abeda · more

E-LAS: Design and Analysis of Completion-Time Agnostic Scheduling for Distributed Deep Learning Cluster · view

Sun, Chao · more

Balancing Fairness and Efficiency for Cache Sharing in Semi-external Memory System · view

Sun, Haifeng · more

DeepHop on Edge: Hop-by-hop Routing by Distributed Learning with Semantic Attention · view

Sun, Shuai · more

DNNARA: A Deep Neural Network Accelerator using Residue Arithmetic and Integrated Photonics · view

Sun, Yu · more

Cooperative Game for Multiple Chargers with Dynamic Network Topology · view

Susanto, Hengky · more

Reducing Latency in Multi-Tenant Data Centers via Cautious Congestion Watch · view

Return to Top

T

Tabaru, Tsuguchika · more

Huffman Coding with Gap Arrays for GPU Acceleration · view

Takafuji, Daisuke · more

Huffman Coding with Gap Arrays for GPU Acceleration · view

Tang, Shanjiang · more

Balancing Fairness and Efficiency for Cache Sharing in Semi-external Memory System · view

Tao, Dingwen · more

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity · view

Tatekawa, Masaru · more

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs · view

Taura, Kenjiro · more

Automatic Identification and Precise Attribution of DRAM Bandwidth Contention · view

Tian, Zhao · more

XShot: Light-weight Link Failure Localization using Crossed Probing Cycles in SDN · view

Tithi, Jesmin Jahan · more

Prune the Unnecessary: Parallel Pull-Push Louvain Algorithms with Automatic Edge Pruning · view

Tong, Wei · more

Mass: Workload-Aware Storage Policy for OpenStack Swift · view
CCHL: Compression-Consolidation Hardware Logging for Efficient Failure-Atomic Persistent Memory Updates · view

Tsaris, Aristeidis · more

Toward Large-Scale Image Segmentation on Summit · view

Return to Top

V

Vivien, Frédéric · more

Energy-aware strategies for reliability-oriented real-time task allocation on heterogeneous platforms · view

Return to Top

W

Wang, Chengning · more

CCHL: Compression-Consolidation Hardware Logging for Efficient Failure-Atomic Persistent Memory Updates · view

Wang, Dali · more

Toward Large-Scale Image Segmentation on Summit · view

Wang, Dongsheng · more

CARD: A Congestion-Aware Request Dispatching Scheme for Replicated Metadata Server Cluster · view
An Adaptive Erasure-Coded Storage Scheme with an Efficient Code-Switching Algorithm · view

Wang, Gang · more

Improving Load Balance via Resource Exchange in Large-Scale Search Engines · view

Wang, Haixia · more

CARD: A Congestion-Aware Request Dispatching Scheme for Replicated Metadata Server Cluster · view
An Adaptive Erasure-Coded Storage Scheme with an Efficient Code-Switching Algorithm · view

Wang, Haoyu · more

A Reinforcement Learning Based System for Minimizing Cloud Storage Service Cost · view

Wang, Huanbin · more

XShot: Light-weight Link Failure Localization using Crossed Probing Cycles in SDN · view

Wang, Jian · more

Saec: Similarity-Aware Embedding Compression in Recommendation Systems · view

wang, jianxin · more

PS : Periodic Strategy for the 40-100Gbps Energy Efficient Ethernet · view
Polo: Receiver-Driven Congestion Control for Low Latency over Commodity Network Fabric · view
AMRT: Anti-ECN Marking to Improve Utilization of Receiver-driven Transmission in Data Center · view

Wang, Jingyu · more

DeepHop on Edge: Hop-by-hop Routing by Distributed Learning with Semantic Attention · view

Wang, Lei · more

Cooperative Game for Multiple Chargers with Dynamic Network Topology · view

Wang, Lipeng · more

DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training · view

Wang, Liqiang · more

Deep Reinforcement Learning based Elasticity-compatible Heterogeneous Resource Management for Time-critical Computing · view

Wang, Rui · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

Wang, Rujia · more

CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs · view

Wang, Shucheng · more

SeRW: Adaptively Separating Read and Write upon SSDs of Hybrid Storage Server in Clouds · view

Wang, Wenwen · more

DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms · view

Wang, Yanfei · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

Wang, Yi · more

Jeor: Accelerate Linear Algebra Operation in SSDs · view

Wang, Zhanye · more

CARD: A Congestion-Aware Request Dispatching Scheme for Replicated Metadata Server Cluster · view

Wang, Zike · more

Mass: Workload-Aware Storage Policy for OpenStack Swift · view

Wang, Zizhong · more

An Adaptive Erasure-Coded Storage Scheme with an Efficient Code-Switching Algorithm · view

Wartel, Franck · more

Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver · view

Wei, Xueliang · more

CCHL: Compression-Consolidation Hardware Logging for Efficient Failure-Atomic Persistent Memory Updates · view

Wen, Mei · more

Towards High-Efficiency Data Centers via Job-Aware Network Scheduling · view

Williams-Young, David B. · more

Parallel Shift-Invert Spectrum Slicing on Distributed Architectures with GPU Accelerators · view

Wu, Chunghsuan · more

OPS: Optimized Shuffle Management System for Apache Spark · view

Wu, Guowei · more

Cooperative Game for Multiple Chargers with Dynamic Network Topology · view

Wu, Hao · more

EPMA: Efficient Partial Message Access in IoT Era · view
Memory-Centric Communication Mechanism for Real-time Autonomous Navigation Applications · view

Wu, Jie · more

Optimizing Flow Bandwidth Consumption with Traffic-diminishing Middlebox Placement · view

Wu, Ruofan · more

CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs · view

Wu, Weigang · more

Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning · view

Wu, Xiaorui · more

Saec: Similarity-Aware Embedding Compression in Recommendation Systems · view
Jeor: Accelerate Linear Algebra Operation in SSDs · view

Return to Top

X

Xia, Wen · more

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity · view

Xiao, Danyang · more

Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning · view

Xiao, Nong · more

FEEL: A Federated Edge Learning System for Efficient and Privacy-Preserving Mobile Healthcare · view

Xing, Huanlai · more

Selective Coflow Completion for Time-sensitive Distributed Applications with Poco · view

Xu, Fei · more

E-LAS: Design and Analysis of Completion-Time Agnostic Scheduling for Distributed Deep Learning Cluster · view

Xu, Hong · more

Saec: Similarity-Aware Embedding Compression in Recommendation Systems · view
Jeor: Accelerate Linear Algebra Operation in SSDs · view
OPS: Optimized Shuffle Management System for Apache Spark · view

Xu, Jie · more

A Reinforcement Learning Based System for Minimizing Cloud Storage Service Cost · view

Xu, Kai · more

SWMapper: Scalable Read Mapper on SunWay TaihuLight · view

Xu, Wenzheng · more

Reliability Augmentation of Requests with Service Function Chain Requirements in Mobile Edge-Cloud Networks · view

Return to Top

Y

Y. Zomaya, Albert · more

Federated Learning with Proximal Stochastic Variance Reduced Gradient Algorithms · view

Yamamoto, Naoya · more

Huffman Coding with Gap Arrays for GPU Acceleration · view

Yamazaki, Ichitaro · more

Performance Portable Supernode-based Sparse Triangular Solver for Manycore Architectures · view

Yan, Shengen · more

DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training · view

yan, yulong · more

PS : Periodic Strategy for the 40-100Gbps Energy Efficient Ethernet · view

Yan, Zijie · more

Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning · view

Yang, Baichen · more

DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training · view

Yang, Bin · more

OPS: Optimized Shuffle Management System for Apache Spark · view

Yang, Chao · more

Parallel Shift-Invert Spectrum Slicing on Distributed Architectures with GPU Accelerators · view

Yang, Hailong · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

Yang, Puyuan · more

SeRW: Adaptively Separating Read and Write upon SSDs of Hybrid Storage Server in Clouds · view

Yang, Yong · more

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds · view

Yang, Ziwei · more

Cooperative Game for Multiple Chargers with Dynamic Network Topology · view

Yao, Jie · more

SeRW: Adaptively Separating Read and Write upon SSDs of Hybrid Storage Server in Clouds · view

Yasudo, Ryota · more

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs · view

Yazane, Takashi · more

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs · view

Ye, Huang · more

Large-scale Simulations of Peridynamics on Sunway Taihulight Supercomputer · view

Ye, Liuqing · more

CCHL: Compression-Consolidation Hardware Logging for Efficient Failure-Atomic Persistent Memory Updates · view

Ye, Songgao · more

DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training · view

Yelick, Kathy · more

Genomic Analysis and Learning at Scale: Mapping Irregular Computations to Advanced Architectures · view

Yew, Pen · more

First Time Miss : Low Overhead Mitigation For Shared Memory Cache Side Channels · view
DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms · view

Yu, Ce · more

Balancing Fairness and Efficiency for Cache Sharing in Semi-external Memory System · view

Yu, Fengwei · more

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures · view

Yu, Hongfang · more

Selective Coflow Completion for Time-sensitive Distributed Applications with Poco · view

Yuan, Rui · more

OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment · view

Yuan, Xu · more

E-LAS: Design and Analysis of Completion-Time Agnostic Scheduling for Distributed Deep Learning Cluster · view

Return to Top

Z

Zahn, Felix · more

On Network Locality in MPI-Based HPC Applications · view

Zhai, Antonia · more

First Time Miss : Low Overhead Mitigation For Shared Memory Cache Side Channels · view

Zhai, Jidong · more

ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs · view
Memory-Centric Communication Mechanism for Real-time Autonomous Navigation Applications · view

Zhan, Yufeng · more

SkyChain: A Deep Reinforcement Learning-Empowered Dynamic Blockchain Sharding System · view

Zhang, Chenyang · more

ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs · view

Zhang, Chunyuan · more

Towards High-Efficiency Data Centers via Job-Aware Network Scheduling · view

Zhang, Feng · more

ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs · view
CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs · view

Zhang, Hequan · more

DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training · view

Zhang, Honglin · more

Saec: Similarity-Aware Embedding Compression in Recommendation Systems · view

Zhang, Jian · more

Large-scale Simulations of Peridynamics on Sunway Taihulight Supercomputer · view

Zhang, Jianting · more

SkyChain: A Deep Reinforcement Learning-Empowered Dynamic Blockchain Sharding System · view

Zhang, Qi · more

Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN · view

Zhang, Sai Qian · more

Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN · view

Zhang, Tao · more

Polo: Receiver-Driven Congestion Control for Low Latency over Commodity Network Fabric · view

Zhang, Wei · more

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds · view

Zhang, Weizhe · more

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity · view

Zhang, Xueying · more

An Online Learning-Based Task Offloading Framework for 5G Small Cell Networks · view

Zhang, Zheng · more

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity · view

Zhao, Laiping · more

XShot: Light-weight Link Failure Localization using Crossed Probing Cycles in SDN · view

Zhao, Ziyi · more

DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms · view

Zheng, Kevin · more

A Reinforcement Learning Based System for Minimizing Cloud Storage Service Cost · view

Zheng, Ningxin · more

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds · view

Zheng, Wenli · more

OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment · view

Zhou, Amelie Chi · more

ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs · view

Zhou, Bing B. · more

Federated Learning with Proximal Stochastic Variance Reduced Gradient Algorithms · view

Zhou, Jieying · more

Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning · view

Zhou, Ruiting · more

An Online Learning-Based Task Offloading Framework for 5G Small Cell Networks · view

Zhou, Wen · more

An Efficient Wear-level Architecture using Self-adaptive Wear Leveling · view

Zhou, Zhi · more

An Online Learning-Based Task Offloading Framework for 5G Small Cell Networks · view

Zhuang, Zirui · more

DeepHop on Edge: Hop-by-hop Routing by Distributed Learning with Semantic Attention · view

Zou, Pengfei · more

Detecting Anomalous Computation with RNNs on GPU-Accelerated HPC Machines · view

Zou, Xiangyu · more

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity · view

Zuo, Pengfei · more

An Efficient Wear-level Architecture using Self-adaptive Wear Leveling · view

Return to Top