반응형
$ ceph -s
cluster:
id: 7025ab16-5810-4382-9318-1bd4a704ef48
health: HEALTH_WARN
1 daemons have recently crashed
services:
mon: 2 daemons, quorum mgmt,mon (age 3m)
mgr: mgmt(active, since 47h)
mds: 1 up:standby
osd: 9 osds: 9 up (since 5m), 9 in (since 47h)
data:
pools: 1 pools, 128 pgs
objects: 4 objects, 35 B
usage: 9.8 GiB used, 80 GiB / 90 GiB avail
pgs: 128 active+clean
Ceph Crash 상태가 뜨면서 crash 된 id값을 찾아보면 아래와 같습니다.
$ ceph crash ls
ID ENTITY NEW
2021-02-20_11:41:52.234574Z_fac113ad-5fa2-40fd-bb00-a0410e0472dc mon.mgmt *
$ ceph crash info 2021-02-20_11:41:52.234574Z_fac113ad-5fa2-40fd-bb00-a0410e0472dc
{
"os_version_id": "7",
"assert_condition": "(*__errno_location ()) == 4",
"utsname_release": "3.10.0-1127.10.1.el7.x86_64",
"os_name": "CentOS Linux",
"entity_name": "mon.mgmt",
"assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/common/fork_function.h",
"timestamp": "2021-02-20 11:41:52.234574Z",
"process_name": "ceph-mon",
"utsname_machine": "x86_64",
"assert_line": 34,
"utsname_sysname": "Linux",
"os_version": "7 (Core)",
"os_id": "centos",
"assert_thread_name": "ms_dispatch",
"utsname_version": "#1 SMP Wed Jun 3 14:28:03 UTC 2020",
"backtrace": [
"(()+0xf630) [0x7fbbf4045630]",
"(gsignal()+0x37) [0x7fbbf2e243d7]",
"(abort()+0x148) [0x7fbbf2e25ac8]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7fbbf7292b76]",
"(()+0x25ccef) [0x7fbbf7292cef]",
"(CrushTester::test_with_fork(int)+0x799) [0x7fbbf77fd839]",
"(OSDMonitor::prepare_new_pool(std::string&, int, std::string const&, unsigned int, unsigned int, unsigned int, unsigned long, unsigned long, float, std::string const&, unsigned int, unsigned long, OSDMonitor::FastReadType, std::ostream*)+0x460) [0x5568226a9aa0]",
"(OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&)+0x1919b) [0x5568226c998b]",
"(OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0x10d) [0x5568226d217d]",
"(OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x2a6) [0x5568226d5b26]",
"(PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x66d) [0x55682266325d]",
"(Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x23ab) [0x55682257c98b]",
"(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x805) [0x556822581ca5]",
"(Monitor::_ms_dispatch(Message*)+0xca0) [0x5568225833f0]",
"(Monitor::ms_dispatch(Message*)+0x26) [0x5568225b0736]",
"(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x26) [0x5568225ad116]",
"(DispatchQueue::entry()+0x1699) [0x7fbbf74b50e9]",
"(DispatchQueue::DispatchThread::entry()+0xd) [0x7fbbf756277d]",
"(()+0x7ea5) [0x7fbbf403dea5]",
"(clone()+0x6d) [0x7fbbf2eec9fd]"
],
"utsname_hostname": "mgmt",
"assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/common/fork_function.h: In function 'int fork_function(int, std::ostream&, std::function<signed char()>)' thread 7fbbe7ff0700 time 2021-02-20 20:41:52.224795\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/common/fork_function.h: 34: FAILED ceph_assert((*__errno_location ()) == 4)\n",
"crash_id": "2021-02-20_11:41:52.234574Z_fac113ad-5fa2-40fd-bb00-a0410e0472dc",
"assert_func": "int fork_function(int, std::ostream&, std::function<signed char()>)",
"ceph_version": "14.2.16"
}
[ 해결책 ]
ceph crash 된 id값을 archive에 넣어주면 됩니다.
** 충돌난 이유는 정확하게는 모르겠으나, 기존 osd를 제거했다가 다시 넣었을 때, 나는것을 봐서는 기존 데이터가 남아있어서 충돌난것으로 보입니다.
$ ceph crash archive 2021-02-20_11:41:52.234574Z_fac113ad-5fa2-40fd-bb00-a0410e0472dc
$ ceph health
HEALTH_OK
$ ceph -s
cluster:
id: 7025ab16-5810-4382-9318-1bd4a704ef48
health: HEALTH_OK
services:
mon: 2 daemons, quorum mgmt,mon (age 6m)
mgr: mgmt(active, since 47h)
mds: 1 up:standby
osd: 9 osds: 9 up (since 8m), 9 in (since 47h)
data:
pools: 1 pools, 128 pgs
objects: 4 objects, 35 B
usage: 9.8 GiB used, 80 GiB / 90 GiB avail
pgs: 128 active+clean
반응형
'BlockStorage(Ceph)' 카테고리의 다른 글
[Ceph] Ceph - ansible 설치 (0) | 2021.07.04 |
---|---|
python-rados 연동 (0) | 2021.04.16 |
CEPH OSD OS재설치 (0) | 2021.01.31 |
[ERROR] ceph application not enabled on 1 pool(s) (0) | 2021.01.30 |
[ERROR] RuntimeError: Unable to create a new OSD id (0) | 2021.01.25 |