您的位置: 专家智库 > >

国家自然科学基金(60903044)

作品数:3 被引量:2H指数:1
发文基金:国家自然科学基金国家高技术研究发展计划更多>>
相关领域:自动化与计算机技术理学动力工程及工程热物理更多>>

文献类型

  • 3篇期刊文章
  • 1篇会议论文

领域

  • 3篇自动化与计算...
  • 1篇动力工程及工...
  • 1篇理学

主题

  • 1篇带宽
  • 1篇多线程
  • 1篇多线程处理
  • 1篇多线程处理器
  • 1篇异构
  • 1篇异构系统
  • 1篇数据加载
  • 1篇图形处理单元
  • 1篇能源
  • 1篇能源节约
  • 1篇细粒度
  • 1篇线程
  • 1篇内存
  • 1篇内存操作
  • 1篇节能
  • 1篇节能模式
  • 1篇加载
  • 1篇NVIDIA
  • 1篇OFF
  • 1篇ACCELE...

机构

  • 1篇国防科学技术...

作者

  • 1篇郑倩冰
  • 1篇孙彩霞
  • 1篇尹远
  • 1篇王永文
  • 1篇窦强

传媒

  • 2篇Journa...
  • 1篇Tsingh...

年份

  • 1篇2013
  • 2篇2012
  • 1篇2011
3 条 记 录,以下是 1-4
排序方式:
Energy optimization of representative barrier algorithms
2012年
Too high energy consumption is widely recognized to be a critical problem in large-scale parallel computing systems.The LogP-based energy-saving model and the frequency scaling method were proposed to reduce energy consumption analytically and systematically for other two representative barrier algorithms:tournament barrier and central counter barrier.Furthermore,energy optimization methods of these two barrier algorithms were implemented on parallel computing platform.The experimental results validate the effectiveness of the energy optimization methods.67.12% and 70.95% energy savings are obtained respectively for tournament barrier and central counter barrier on platforms with 2048 processes with 1.55%?8.80% performance loss.Furthermore,LogP-based energy-saving analytical model for these two barrier algorithms is highly accurate as the predicted energy savings are within 9.67% of the results obtained by simulation.
陈娟董勇
关键词:能源节约并行计算系统并行计算平台节能模式LOGP
Fast Parallel Cutoff Pair Interactions for Molecular Dynamics on Heterogeneous Systems
2012年
Heterogeneous systems with both Central Processing Units (CPUs) and Graphics Processing Units (GPUs) are frequently used to accelerate short-ranged Molecular Dynamics (MD) simulations. The most time-consuming task in short-ranged MD simulations is the computation of particle-to-particle interac- tions. Beyond a certain distance, these interactions decrease to zero. To minimize the operations to investi- gate distance, previous works have tiled interactions by employing the spatial attribute, which increases the memory access and GPU computations, hence decreasing performance. Other studies ignore the spatial attribute and construct an all-versus-all interaction matrix, which has poor scalability. This paper presents an improved algorithm. The algorithm first bins particles into voxels according to the spatial attributes, and then tiles the all-versus-all matrix into voxel-versus-voxel sub-matrixes. Only the sub-matrixes between neighbor- ing voxels are computed on the GPU. Therefore, the algorithm reduces the distance examine operations and limits additional memory access and GPU computations. This paper also adopts a multi-level program- ming model to implement the algorithm on multi-nodes of Tianhe-lA. By employing (1) a patch design to ex- ploit parallelism across the simulation domain, (2) a communication overlapping method to overlap the communications between CPUs and GPUs, and (3) a dynamic workload balancing method to adjust the workloads among compute nodes, the implementation achieves a speedup of 4.16x on one NVIDIA Tesla M2050 GPU compared to a 2.93 GHz six-core Intel Xeon X5670 CPU. In addition, it runs 2.41x faster on 256 compute nodes of Tianhe-lA (with two CPUs and one GPU inside a node) than on 256 GPU-excluded nodes.
Qiang WuCanqun YangTao TangKai Lu
Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems被引量:2
2013年
Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU.
杨灿群吴强胡慧俐石志才陈娟唐滔
关键词:异构系统GPU图形处理单元NVIDIA内存操作
细粒度多线程处理器中前瞻性数据加载的设计与实现
细粒度多线程是一种典型的线程级并行性开发技术,通过每周期的线程切换来实现高吞吐率执行.设计并实现了一种细粒度多线程处理器中的前瞻性数据加载机制,该机制预测LOAD操作在数据cache命中,不立即进行线程切换,而是继续执行...
王永文郑倩冰尹远孙彩霞窦强
关键词:多线程带宽
文献传递
共1页<1>
聚类工具0