ºìÁªLinuxÃÅ»§
Linux°ïÖú

MPICH2ÔÚLinuxϲ¢ÐмÆËãµÄÎÊÌâ

·¢²¼Ê±¼ä:2010-04-21 17:55:44À´Ô´:ºìÁª×÷Õß:¹ÛÒôÅ«
ÎÊÌâǰµÄÃèÊö£ºÓõÄÊÇIBMµÄµ¶Æ¬»ú£¬6¸ö½Úµã£¬Ã¿¸ö½Úµã2¸öºË£¬Ã¿¸öºË4¸öºËÐÄ¡£
ÄÚ´æÊÇ12GµÄÄڴ档ϵͳÊÇRed Hat Enterprise Linux 5(64λ°æ±¾µÄ)£¬
°²×°µÄÈí¼þÊÇ×îеÄintel fortran(l_cprof_p_11.1.059_intel64.tgz£¬64λ°æ±¾µÄ)£¬
MPICH2(mpich2-1.0.tar.gz£¬64λ°æ±¾)
³ÌÐòÖеÄÊý×é»ù±¾ÉÏÈ«ÊǶ¯Ì¬·ÖÅäµÄ£¬³ý·ÇÊǷdz£Ð¡µÄÊý×顣ͬÑùµÄ³ÌÐòµ±¼ÆËãÁ½¿éÍø¸ñʱ
£¨³ÌÐòÖÐÒ»¸ö²ÎÊý¸ÄΪ2£¬È»ºómpiexec -n 3 ./debugger£©¾ÍÄܵõ½ÕýÈ·µÄ½á¹û£¬µ±Ñü¼ÆËã
38¿éÍø¸ñʱ£¨³ÌÐòÖÐÒ»¸ö²ÎÊý¸ÄΪ2£¬È»ºómpiexec -n 39 ./debugger£©¾Í³öÏÖÈçϵĴíÎó£º
£¨¸ù½ø³ÌÓÃÀ´ÊÕ¼¯¼ÆËãºÃµÄÊý¾Ý£¬²»²ÎÓë¼ÆË㣬ËùÒÔÎÒÕâ¸ö³ÌÐòËùÐèÒªµÄ½ø³ÌÒª±ÈÍø¸ñÊý¶à1£©
£¨ÎªÁËÅųý¶ÑÕ»Òç³ö£¬ÄÚ´æ²»¹»µÈÇé¿ö£¬ÒѾ­Ê¹ÓÃÁËÈçÏÂÉèÖãº
Êý¾Ý¶Î³¤¶È£ºulimit -d unlimited
×î´óÄÚ´æ´óС£ºulimit -m unlimited
¶ÑÕ»´óС£ºulimit -s unlimited
CPU ʱ¼ä£ºulimit -t unlimited
ÐéÄâÄڴ棺ulimit -v unlimited£©
[test@mnode2 MrLuzhiliang]$ mpif90 -o debugger MPIscch.f90
[test@mnode2 MrLuzhiliang]$ mpiexec -n 39 ./debugger
proccess 1 :now the loop is starting......
proccess 2 :now the loop is starting......
proccess 3 :now the loop is starting......
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
debugger 000000000044B69F Unknown Unknown Unknown
debugger 0000000000414827 Unknown Unknown Unknown
debugger 0000000000403B0C Unknown Unknown Unknown
libc.so.6 000000355761D974 Unknown Unknown Unknown
debugger 0000000000403A19 Unknown Unknown Unknown
aborting job:
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(1058): MPI_Allreduce(sbuf=0x7fff4a23f234, rbuf=0x7fff4a23f238, count=1, MPI_INTEGER, MPI_SUM, MPI_COMM_WORLD) failed
MPIR_Allreduce(545):
MPIC_Recv(98):
MPIC_Wait(308):
MPIDI_CH3_Progress_wait(207): an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(492):
connection_recv_fail(1728):
MPIDU_Socki_handle_read(590): connection closed by peer (set=0,sock=3)
aborting job:
Fatal error in MPI_Sendrecv: Internal MPI error!, error stack:
MPI_Sendrecv(207): MPI_Sendrecv(sbuf=0xaeff570, scount=1152, dtype=0x4c000430, dest=9, stag=10, rbuf=0xaf01ea0, rcount=1152, dtype=0x4c000430, src=9, rtag=10, MPI_COMM_WORLD, status=0x7cd300) failed
(unknown)(): Internal MPI error!
rank 26 in job 10 mnode2_54617 caused collective abort of all ranks
exit status of rank 26: return code 13
rank 3 in job 10 mnode2_54617 caused collective abort of all ranks
exit status of rank 3: return code 174
rank 0 in job 10 mnode2_54617 caused collective abort of all ranks
exit status of rank 0: return code 13
ÎÄÕÂÆÀÂÛ

¹²ÓÐ 1 ÌõÆÀÂÛ

  1. deepwhite ÓÚ 2010-04-22 08:55:07·¢±í:

    , SIGSEGV is the signal sent to a process when it makes an invalid memory reference, or segmentation fault ....
    http://en.wikipedia.org/wiki/SIGSEGV
    ¼ì²éһϳÌÐòÊDz»ÊÇÓÐÉè¼Æ²»µ±µÄµØ·½°É£¿