Software-RAID HOWTO: 에러의 복구

다음 이전 차례

4. 에러의 복구

Q: RAID-1을 사용하고 있는데, 디스크가 작동중 전원이 꺼졌습니다. 어떻게 해야 할까요?
A: 이런 상황에서는 몇가지 방법이 있다.
The redundancy of RAID levels is designed to protect against a disk failure, not against a power failure.
There are several ways to recover from this situation.
- 첫번째 방법은 raid 도구들을 사용하는 것이다. 이것은 raid의 데이터들을 동기화 시켜준다.(sync) 하지만, 파일시스템의 손상은 복구해주지 않으므로 후에 fsck를 사용해 고쳐야 한다. RAID 는 ckraid /etc/raid1.conf 통해 점검해볼수 있다.(RAID-1일 경우이다, 다른 경우라면, /etc/raid5.conf처럼사용해야 한다.) ckraid /etc/raid1.conf --fix 를 사용해 RAID된 디스크중 하나를 선택해서, 다른 디스크로 미러링 시킬수 있다. 디스크중 어느 것을 선택해야 할지 모른다면, ckraid /etc/raid1.conf --fix --force-source /dev/hdc3 이런 식으로 --force-source옵션을 사용하라.
  ckraid 는 --fix 옵션을 제거함으로써 RAID 시스템에 어떤 변화없이 안전하게 시도 될 수 있다. 제안된 변경에 대해서, 만족할 경우에 --fix 옵션을 사용해라.
  
  Method (1): Use the raid tools. These can be used to sync the raid arrays. They do not fix file-system damage; after the raid arrays are sync'ed, then the file-system still has to be fixed with fsck. Raid arrays can be checked with ckraid /etc/raid1.conf (for RAID-1, else, /etc/raid5.conf, etc.)
  Calling ckraid /etc/raid1.conf --fix will pick one of the disks in the array (usually the first), and use that as the master copy, and copy its blocks to the others in the mirror.
  To designate which of the disks should be used as the master, you can use the --force-source flag: for example, ckraid /etc/raid1.conf --fix --force-source /dev/hdc3
  The ckraid command can be safely run without the --fix option to verify the inactive RAID array without making any changes. When you are comfortable with the proposed changes, supply the --fix option.
- 두번째 방법은 첫번째 방법보다 많이 좋은 방법은 아니다. /dev/hda3 와 /dev/hdc3 로 만들어진 RAID-1 디스크가 있다고 가정할때, 아래와 같이 해볼수 있다.
  Method (2): Paranoid, time-consuming, not much better than the first way. Lets assume a two-disk RAID-1 array, consisting of partitions /dev/hda3 and /dev/hdc3. You can try the following:
  1. fsck /dev/hda3
  2. fsck /dev/hdc3
  3. 두개의 파티션중, 에러가 적은 쪽이나, 더 쉽게 복구가 된쪽, 또는 복구하고 싶은 데이터가 남아있는 쪽등, 새로운 마스터로 쓸 파티션을 결정해야 한다. /dev/hdc3 를 선택했다 하자.
    decide which of the two partitions had fewer errors, or were more easily recovered, or recovered the data that you wanted. Pick one, either one, to be your new ``master'' copy. Say you picked /dev/hdc3.
  4. dd if=/dev/hdc3 of=/dev/hda3
  5. mkraid raid1.conf -f --only-superblock
  마지막 두단계 대신에 ckraid /etc/raid1.conf --fix --force-source /dev/hdc3 를 사용하면 좀 더 빠를 것이다.
  Instead of the last two steps, you can instead run ckraid /etc/raid1.conf --fix --force-source /dev/hdc3 which should be a bit faster.
- 세번쨰 방법은 오랬동안 fsck를 기다리기가 귀찮은 사람들을 위한 것이다. 첫번째 3단계를 뛰어넘고 바로 마지막 두단계를 실행하는 것이다. 그런 후에 fsck /dev/md0 를 실행하는 것이다. 이것은 첫번째 방법의 모양을 바꾼 것일 뿐이다.
  Method (3): Lazy man's version of above. If you don't want to wait for long fsck's to complete, it is perfectly fine to skip the first three steps above, and move directly to the last two steps. Just be sure to run fsck /dev/md0 after you are done. Method (3) is actually just method (1) in disguise.
어떤 방법도 RAID를 동기화 시켜줄 수 있을 뿐이고, 파일 시스템 또한 잘 복구되기를 원할 것이다. 이를 위해서, fsck를 md device를 unmount 시킨후 fsck를 실행하라.

In any case, the above steps will only sync up the raid arrays. The file system probably needs fixing as well: for this, fsck needs to be run on the active, unmounted md device.

세개의 디스크로 구성된 RAID-1 시스템이라면 많이 일치한 부분을 통해, 답을 찾아내는 방법등의, 조금 더 많은 방법이 있겠지만, 이런 것을 자동으로 해주는 도구는 현재 지원되지 않는다.

With a three-disk RAID-1 array, there are more possibilities, such as using two disks to ''vote'' a majority answer. Tools to automate this do not currently (September 97) exist.
Q: RAID-4 또는 RAID-5 시스템을 가지고 있는데, 디스크 작동중에 꺼졌습니다. 어떻게 해야 할까요?
A: RAID-4나 RAID-5 시스템에서는 예비 수리를 위해 fsck를 사용할 수 없다. 먼저 ckraid를 사용하라.
ckraid 는 --fix 옵션을 제거함으로써 RAID 시스템에 어떤 변화없이 안전하게 시도 될 수 있다. 제안된 변경에 대해서, 만족할 경우에 --fix 옵션을 사용해라.

원한다면,--suggest-failed-disk-mask 옵션을 통해 디스크들중 하나를 망가진 디스크로 지정한 채 ckraid를 시도 할수 있다.
RAID-5는 단지 하나의 bit만이 flag 로 설정되어 있기 때문에, RAID-5는 두개의 디스크가 망가졌을 때는 복구할 수 없다. 아래는 binary bit mask 이다.

The redundancy of RAID levels is designed to protect against a disk failure, not against a power failure.
Since the disks in a RAID-4 or RAID-5 array do not contain a file system that fsck can read, there are fewer repair options. You cannot use fsck to do preliminary checking and/or repair; you must use ckraid first.

The ckraid command can be safely run without the --fix option to verify the inactive RAID array without making any changes. When you are comfortable with the proposed changes, supply the --fix option.

If you wish, you can try designating one of the disks as a ''failed disk''. Do this with the --suggest-failed-disk-mask flag.
Only one bit should be set in the flag: RAID-5 cannot recover two failed disks. The mask is a binary bit mask: thus:
```
    0x1 == first disk
    0x2 == second disk
    0x4 == third disk
    0x8 == fourth disk, etc.
            
```
또는, --suggest-fix-parity 옵션을 통해 parity 섹터를 수정할 수도 있다. 이것은 다른 섹터들로부터 parity 를 다시 계산해낼 것이다.
--suggest-failed-dsk-mask 와 --suggest-fix-parity 옵션은 --fix옵션을 제거함으로써, 가능한 수정 계획의 확인을 위해 안전하게 사용될수 있다.

Alternately, you can choose to modify the parity sectors, by using the --suggest-fix-parity flag. This will recompute the parity from the other sectors.

The flags --suggest-failed-dsk-mask and --suggest-fix-parity can be safely used for verification. No changes are made if the --fix flag is not specified. Thus, you can experiment with different possible repair schemes.
Q: /dev/hda3 과 /dev/hdc3 두개의 디스크로 /dev/md0 의 RAID-1 시스템을 만들어서 사용하고 있습니다. 최근에, /dev/hdc3 이 고장나서 새 디스크를 구입했다. 제 가장 친한 친구가, ''dd if=/dev/hda3 of=/dev/hdc3''를 해보라고 해서 해보았지만, 아직도 작동하고 있지 않습니다.
My RAID-1 device, /dev/md0 consists of two hard drive partitions: /dev/hda3 and /dev/hdc3. Recently, the disk with /dev/hdc3 failed, and was replaced with a new disk. My best friend, who doesn't understand RAID, said that the correct thing to do now is to ''dd if=/dev/hda3 of=/dev/hdc3''. I tried this, but things still don't work.
A: 친구를 당신의 컴퓨터에 가까이 가게 하지 않게 해서, 교우관계를 유지하는 게 좋을 것이다. 다행스럽게도, 심각한 손상을 입지는 않는다. 아래와 같이 실행함으로써, 시스템을 회복시킬수 있을 것이다.
mkraid raid1.conf -f --only-superblock
dd 명령어를 이용해서, 파티션의 복사본을 만드는 것은 대부분 가능하다. 하지만, RAID-1 시스템에서는 superblock 이 다르기 때문에 안된다. 때문에 RAID-1을 두 파티션중의 하나의 superblock를 다시 만들어주면, 다시 사용가능하게 될 것이다.
You should keep your best friend away from you computer. Fortunately, no serious damage has been done. You can recover from this by running:
mkraid raid1.conf -f --only-superblock
By using dd, two identical copies of the partition were created. This is almost correct, except that the RAID-1 kernel extension expects the RAID superblocks to be different. Thus, when you try to reactivate RAID, the software will notice the problem, and deactivate one of the two partitions. By re-creating the superblock, you should have a fully usable system.
Q: 내 mkraid 는 --only-superblock 옵션이 지원되지 않는 버젼입니다. 어떻게 해야 할까요?
A: 새로운 툴에서는 --force-resync 옵션으로 바꿔었고, 최신의 툴들의 사용은 아래와 같이 사용해야 한다.
umount /web (/dev/md0가 마운트 되어있는 곳.) raidstop /dev/md0 mkraid /dev/md0 --force-resync --really-force raidstart /dev/md0
cat /proc/mdstat 를 통해 결과를 볼 수 있을 것이고, mount /dev/md0를 통해 다시 사용가능할 것이다.

The newer tools drop support for this flag, replacing it with the --force-resync flag. It has been reported that the following sequence appears to work with the latest tools and software:
umount /web (where /dev/md0 was mounted on) raidstop /dev/md0 mkraid /dev/md0 --force-resync --really-force raidstart /dev/md0
After doing this, a cat /proc/mdstat should report resync in progress, and one should be able to mount /dev/md0 at this point.
Q: /dev/hda3 과 /dev/hdc3 두개의 디스크로 /dev/md0 의 RAID-1 시스템을 만들어서 사용하고 있습니다. 제 가장 친한 (여자?) 친구가, 못보는 사이, /dev/hda3 를 fsck 로 실행시키는 바람에, RAID가 동작하지 않고 있습니다. 어떻게 해야 할까요?
My RAID-1 device, /dev/md0 consists of two hard drive partitions: /dev/hda3 and /dev/hdc3. My best (girl?)friend, who doesn't understand RAID, ran fsck on /dev/hda3 while I wasn't looking, and now the RAID won't work. What should I do?
A: 당신은 가장 친한 친구라는 개념을 다시 한번 생각해보아야 할것이다. 일반적으로 fsck 는 RAID를 만드는 파티션중 하나에서 돌려서는 절대로 안된다. 파티션 손상이나, 데이터 손상이 발생되지 않았다고 한다면, RAID-1 시스템을 아래와 같이 수리할 수 있다.
1. /dev/hda3 의 백업을 받는다.
2. dd if=/dev/hda3 of=/dev/hdc3
3. mkraid raid1.conf -f --only-superblock
You should re-examine your concept of ``best friend''. In general, fsck should never be run on the individual partitions that compose a RAID array. Assuming that neither of the partitions are/were heavily damaged, no data loss has occurred, and the RAID-1 device can be recovered as follows:
1. make a backup of the file system on /dev/hda3
2. dd if=/dev/hda3 of=/dev/hdc3
3. mkraid raid1.conf -f --only-superblock
This should leave you with a working disk mirror.
Q: 왜 위의 복구 순서대로 해야 하는가?
A: RAID-1 을 이루는 파티션들은 완벽히 같은 복제본이어야 하기 때문이다. 미러링이 작동되지 않을 경우, 피티션들중 하나를 RAID를 사용하지 않고 mount해서 사용하고, 아래와 같은 명령을 사용해 RAID 시스템을 복구한 후, 파티션을 unmount 하고, RAID 시스템을 다시 시작하여야 한다. 아래의 명령들은 RAID-1이 아닌 다른 레벨들에 사용하면 안된다는 것을 주의하라.

Because each of the component partitions in a RAID-1 mirror is a perfectly valid copy of the file system. In a pinch, mirroring can be disabled, and one of the partitions can be mounted and safely run as an ordinary, non-RAID file system. When you are ready to restart using RAID-1, then unmount the partition, and follow the above instructions to restore the mirror. Note that the above works ONLY for RAID-1, and not for any of the other levels.

위에서 처럼 망가지지 않은 파티션을 망가진 파티션으로 복사하는 것은 기분 좋은 일일 것이다. 이제 md 장치를 fsck 로 검사하기만 하면 된다.

It may make you feel more comfortable to reverse the direction of the copy above: copy from the disk that was untouched to the one that was. Just be sure to fsck the final md.
Q: 나는 위의 질문들에 혼란스럽다. fsck /dev/md0 를 실행하는 것은 안전한가?
A: 그렇다. md 장치들을 fsck 하는 것은 안전하다. 사실, 그게 안전하게 fsck를 실행시키는 유일한 방법이다.
Yes, it is safe to run fsck on the md devices. In fact, this is the only safe place to run fsck.
Q: 디스크가 천천히 오류나기 시작한다면, 어느 파티션의 오류인지 명백할것인가? 이런 혼란은 관리자로부터 위험한 결정을 내리게 할 수도 있지 않은가.
A: 디스크에 문제가 생기기 새작하면, RAID의 저수준 드라이버가 error 코드를 반환할 것이다. RAID 드라이버는 좋은 쪽 disk의 superblock안에 ''bad'' 표시를 할것이고, 가능한 미러링을 유지하도록 명령할 것이다. (나중에 어떤 미러링이 좋은 쪽이고 나쁜쪽인지 배우게 될 것이다.) 물론, disk와 저수준 드라이버가 읽기/쓰기 에러를 감지할 것이고, 조용히 데이터가 망가지지는 않는다.

Once a disk fails, an error code will be returned from the low level driver to the RAID driver. The RAID driver will mark it as ``bad'' in the RAID superblocks of the ``good'' disks (so we will later know which mirrors are good and which aren't), and continue RAID operation on the remaining operational mirrors.

This, of course, assumes that the disk and the low level driver can detect a read/write error, and will not silently corrupt data, for example. This is true of current drives (error detection schemes are being used internally), and is the basis of RAID operation.
Q: hot-repair 는 무엇인가?
A: RAID 시스템중 하나의 디스크가 망가졌을 때, RAID의 중단없이 실행중에 여분의 디스크의 추가를 통해 복구하는 ''빠른 복구'' 를 완성하려고 진행중이다. 그러나 이것을 사용하기 위해선, 여분의 디스크는 부팅시 선언되었거나. 몇몇 특별한 장비가 지원하는 전원이 들어온 상태에서 하드를 추가가 가능해야 한다.
Work is underway to complete ``hot reconstruction''. With this feature, one can add several ``spare'' disks to the RAID set (be it level 1 or 4/5), and once a disk fails, it will be reconstructed on one of the spare disks in run time, without ever needing to shut down the array.

However, to use this feature, the spare disk must have been declared at boot time, or it must be hot-added, which requires the use of special cabinets and connectors that allow a disk to be added while the electrical power is on.

97년 10월 MD의 베타버젼이 할수 있는 것은 아래와 같다.
- 여분의 디스크를 통한 RAID 1 와 5 의 복구
- 잘못된 시스템종료시 RAID-5 parity 의 복구
- 작동하는 RAID 1, 4,5 시스템에 여분 드라이브 추가.
현재 기본적으로 자동복구는 설정되어 있지 않고, include/linux/md.h 안의 SUPPORT_RECONSTRUCTION 값을 바꾸어 설정할 수 있다.

As of October 97, there is a beta version of MD that allows:
- RAID 1 and 5 reconstruction on spare drives
- RAID-5 parity reconstruction after an unclean shutdown
- spare disk to be hot-added to an already running RAID 1 or 4/5 array
By default, automatic reconstruction is (Dec 97) currently disabled by default, due to the preliminary nature of this work. It can be enabled by changing the value of SUPPORT_RECONSTRUCTION in include/linux/md.h.

커널 기반의 복구가 설정되어 있고, RAID 시스템에 여분의 디스크( superblock은 이미 mkraid를 통해 만들어졌을 것이다.) 를 추가하려 한다면, 커널은 내용을 자동적으로 복구시켜줄 것이다. (일반적인 mdstop, 디스크교체, ckraid, mdrun 의 절차를 밟지 않아도 된다.)

If spare drives were configured into the array when it was created and kernel-based reconstruction is enabled, the spare drive will already contain a RAID superblock (written by mkraid), and the kernel will reconstruct its contents automatically (without needing the usual mdstop, replace drive, ckraid, mdrun steps).

당신이 자동 복구를 실행하지 않았고, 교체할 디스크를 설정하지 않았다면, Gadi Oxman < gadio@netvision.net.il> 가 제안한 아래와 같은 단계를 따를 수 있다.
- 하나의 디스크가 제거되었다면, RAID는 degraged mode 로 설정되어 작동할 것이다. 이것을 full operation mode로 장동시키기 위해서는 아래와 같은 절차가 필요하다.
  - RAID를 중단시켜라. (mdstop /dev/md0)
  - 고장난 디스크를 교체하라.
  - 내용복구를 위해 ckraid raid.conf 를 실행하라.
  - RAID를 다시실행시켜라. (mdadd, mdrun).
  중요한 점은 RAID는 다시 모든 드라이브에서 돌아갈 것이라는 것과. 하나의 디스크의 문제가 생겼을 때를 대비한 것이라는 것이다.
현재의 하나의 교체디스크를 여러개의 RAID에 배분하는 것은 불가능하다. 각각의 RAID는 각각의 disk를 필요로 한다.

If you are not running automatic reconstruction, and have not configured a hot-spare disk, the procedure described by Gadi Oxman < gadio@netvision.net.il> is recommended:
- Currently, once the first disk is removed, the RAID set will be running in degraded mode. To restore full operation mode, you need to:
  - stop the array (mdstop /dev/md0)
  - replace the failed drive
  - run ckraid raid.conf to reconstruct its contents
  - run the array again (mdadd, mdrun).
  At this point, the array will be running with all the drives, and again protects against a failure of a single drive.
Currently, it is not possible to assign single hot-spare disk to several arrays. Each array requires it's own hot-spare.
Q: 초보 관리자가 문제가 생겼다는 것을 알 수 있도록 ''미러링되고 있는 디스크중 하나가 망가졌다.멍청아.'' 같은 경고를 소리로 들을 수 있기를 원한다.
A: 커널은 ``KERN_ALERT'' 이벤트에 대해서 우선적으로 syslog에 로그를 남기고 있다. syslog를 모니터링할 몇몇 소프트웨어들이 있고, 그것들이 자동적으로 PC speaker로 beep를 울리거나, 삐삐를 호출하거나. e-mail등을 보낼것이다.
The kernel is logging the event with a ``KERN_ALERT'' priority in syslog. There are several software packages that will monitor the syslog files, and beep the PC speaker, call a pager, send e-mail, etc. automatically.
Q: RAID-5를 어떻게 degraded mode로 사용할 수 있는가? (디스크 하나에 문제가 생겼고, 아직 교체하지 않았다.)
A: Gadi Oxman < gadio@netvision.net.il> 이 적기를... 일반적으로, n 개의 드라이브로 raid-5 시스템을 돌리려면 아래와 같이 한다.:
mdadd /dev/md0 /dev/disk1 ... /dev/disk(n) mdrun -p5 /dev/md0
디스크중에 하나가 망가진 경우라도 여전히 mdadd를 사용해 설정해야 한다. (?? 망가진 디스크 대신 /dev/null을 사용해서 도전해라 ??? 조심해라..) (역자, 덧. 이 물음표는 뭘까... 이 문서의 저자는 이 문서의 윗부분에서 이 방법은 시도해 본적이 없다고 했다...-.-) RAID는 (n-1)개의 드라이브를 사용한 degraded mode로 동작할 것이다. ``mdrun''가 실패했다면, kernel은 에러를 낼 것이다. ( 몇개의 문제가 있는 디스크라든지, shutdown을 제대로 안한 경우.) ''dmesg'' 명령어를 사용하여 kernel의 에러를 보아라. raid-5는 디스크가 하나 깨지는 것보다 전원이 나갔을 더 위험하며, 아래와 같이 새로운 RAID superblock를 만듦으로써 복구를 시도 할 수 있다.
mkraid -f --only-superblock raid5.conf
superblock의 복구는 모든 드라이브가 ''OK'' 로 표시되는 상태(아무일도 일어나지 않았다면.)에 영향을 받지 않기 때문에 간단할 것이다.
Gadi Oxman < gadio@netvision.net.il> writes: Normally, to run a RAID-5 set of n drives you have to:
mdadd /dev/md0 /dev/disk1 ... /dev/disk(n) mdrun -p5 /dev/md0
Even if one of the disks has failed, you still have to mdadd it as you would in a normal setup. (?? try using /dev/null in place of the failed disk ??? watch out) Then,
The array will be active in degraded mode with (n - 1) drives. If ``mdrun'' fails, the kernel has noticed an error (for example, several faulty drives, or an unclean shutdown). Use ``dmesg'' to display the kernel error messages from ``mdrun''. If the raid-5 set is corrupted due to a power loss, rather than a disk crash, one can try to recover by creating a new RAID superblock:
mkraid -f --only-superblock raid5.conf
A RAID array doesn't provide protection against a power failure or a kernel crash, and can't guarantee correct recovery. Rebuilding the superblock will simply cause the system to ignore the condition by marking all the drives as ``OK'', as if nothing happened.
Q: 디스크에 문제가 발생하면 RAID-5는 어떻게 동작하나요?
A: 아래와 같은 전형적인 동작 단계가 있다.
- RAID-5 가 작동한다.
- RAID 작동중 하나의 디스크에 문제가 생겼다.
- 드라이브의 firmware와 저수준의 Linux 디스크 컨트롤러 드라이버는 오류를 감지하고 MD driver에 보고한다.
- MD driver는 나머지 사용가능한 드라이브들로, 커널의 상위레벨 부분에 에러와 관계없이 /dev/md0 를 제공할 것이다. (성능은 떨어진다.)
- 관리자는 일반적으로 umount /dev/md0 과 mdstop /dev/md0를 할 수 있다.
- 고장난 디스크가 교체되지 않아도, 관리자는 mdadd 와 mdrun를 실행시켜서. 여전히 degraded mode로 동작시킬 수 있을 것이다.
The typical operating scenario is as follows:
- A RAID-5 array is active.
- One drive fails while the array is active.
- The drive firmware and the low-level Linux disk/controller drivers detect the failure and report an error code to the MD driver.
- The MD driver continues to provide an error-free /dev/md0 device to the higher levels of the kernel (with a performance degradation) by using the remaining operational drives.
- The sysadmin can umount /dev/md0 and mdstop /dev/md0 as usual.
- If the failed drive is not replaced, the sysadmin can still start the array in degraded mode as usual, by running mdadd and mdrun.
Q:
A:
Q: 왜 13번째 질문은 없나요?
A: 당신이, RAID와 높은 능력과 UPS와 관련이 있는 사람이라면, 그것들을 미신적으로 믿는 것 마저도 좋은 생각일 것이다. 그것은 절대 망가지지 않을 것이다. 그렇지 않은가?
If you are concerned about RAID, High Availability, and UPS, then its probably a good idea to be superstitious as well. It can't hurt, can it?
Q: RAID-5 시스템에서 하나의 고장난 디스크를 교체했을 뿐인데.. RAID 를 복구한후 fsck 가 많은 error를 보여줍니다. 그게 정상인가요?
A: 정상이 아니다. 그리고 fsck를 수정을 하지 않는 검사 전용모드에서 실행시키지 않았다면, 데이터에 문제가 생기는 것이 충분히 가능하다. 불행하게도, 디스크 교체후 RAID-5의 disk 순서를 우연히 바꾸어 버리는 흔한 실수 중의 하나이다. 비록 RAID superblock이 바람직한 방법으로 저장되긴 하지만, 모든 툴들이 이 정보를 따르는 것은 아니다. 특별히, ckraid 의 현재 버젼은 -f 옵션을 사용해서 현재 superblock안의 데이터 대신, 정보를 읽어오도록 할 수 있다. (대체로 /etc/raid5.conf파일을 사용한다.) 지정한 정보가 부정확하면, 교체한 디스크가 부정확하게 복구될 것이고. 이런 종류의 실수들이 많은 fsck error들을 내는 증상을 보여준다. 그리고 당신이 신기한 경우에(이런 실수로 모든 데이터를 손실당하는..) 해당된다면, RAID의 재설정전에 모든 데이터를 백업하기를 강력히 추천한다. No. And, unless you ran fsck in "verify only; do not update" mode, its quite possible that you have corrupted your data. Unfortunately, a not-uncommon scenario is one of accidentally changing the disk order in a RAID-5 array, after replacing a hard drive. Although the RAID superblock stores the proper order, not all tools use this information. In particular, the current version of ckraid will use the information specified with the -f flag (typically, the file /etc/raid5.conf) instead of the data in the superblock. If the specified order is incorrect, then the replaced disk will be reconstructed incorrectly. The symptom of this kind of mistake seems to be heavy & numerous fsck errors.
And, in case you are wondering, yes, someone lost all of their data by making this mistake. Making a tape backup of all data before reconfiguring a RAID array is strongly recommended.
Q: QuickStart 에서 mdstop는 단지 디스크들을 동기화(sync)시키는 것 뿐이라고 하는데, 그게 정말 필요한 가요? 파일시스템을 unmount하는 것으로 충분하지 않나요?
A: mdstop /dev/md0 명령은.
- shutdown이 잘 되었섰는지를 발견하기 위해, ''clean''을 표시한다.
- RAID를 동기화 시킨다. 후에 파일시스템의 unmount보다 중요하지 않지만, 파일시스템을 통하는 것이 아닌, /dev/md0을 직접 access 하기 때문에, 중요하다.
The command mdstop /dev/md0 will:
- mark it ''clean''. This allows us to detect unclean shutdowns, for example due to a power failure or a kernel crash.
- sync the array. This is less important after unmounting a filesystem, but is important if the /dev/md0 is accessed directly rather than through a filesystem (for example, by e2fsck).

다음 이전 차례