Difference between revisions of "Manual Maintenance"

From DFN Wiki
Jump to: navigation, search
m (1 revision imported)
 
m (1 revision imported)
(No difference)

Revision as of 17:54, 23 August 2018

Connecting to your camera

If you are ever required to perform any manual maintenance, first connect to your camera.

If you are unable to connect to the camera locally via the network (ethernet or WiFi), or remotely over the VPN, you can log in locally using a HDMI or VGA monitor and USB keyboard.

Direct Connection

  1. Before turning on system, connect a keyboard using USB and a monitor using either HDMI or VGA *insert figure of where to connect*
  2. log in as root

Checking removable hard drives

General notes:

  • Before deploying cameras the drives should be empty - delete the data recorded eg during testing in the lab, from all drives including /data0
  • It is a good practice to check the how the drives are full before replacing them when servicing the observatory. If it was running for several months, at least /data1 should be full. If that is not true, the observatory was not working and needs to be serviced. Also you can decide to replace only the drives that actually have some data on them and leave the empty ones in the box.

Now let's start - first switch the drives on and mount them:

$ python /opt/dfn-software/enable_ext-hd.py

In case of DFNEXT observatories, wait at least 20 seconds fro the drives to spin up. then mount the drives:

$ mount /data1
$ mount /data2
$ mount /data3

In case of DFNSMALL observatories, wait at least 40 seconds fro the system to recognize the USB enclosure and spin up the drives and then mount them:

$ mount -a

The next step is to list the drives - we are interested only in the data partitions:

$ df -h 

In case of DFNEXT observatory with three 6TB removable drives installed and running for several weeks, you will get listing like this:

Filesystem      Size  Used Avail Use% Mounted on
...
/dev/sda3       390G   55G  316G  15% /data0
/dev/sdb1       5.5T  1.1T  4.2T  21% /data1    ..... This drive is 21% full
/dev/sdd1       5.5T   58M  5.2T   1% /data2    ..... This drive is empty
/dev/sdc1       5.5T   89M  5.2T   1% /data3    ..... This drive is empty

In case of DFSMALL observatory with two 8TB removable drives installed and runnin $ cd /data0/DFNXXXNN/YYYY/MM/g for more than 1/2 year, now pretty much full of data, you will see:

Filesystem      Size  Used Avail Use% Mounted on
...
/dev/sda5       406G   59G  327G  16% /data0
/dev/sdc1       7.3T  6.8T   90G  99% /data1    ..... This drive is full
/dev/sdb1       7.3T  6.5T  367G  95% /data2    ..... This drive is nearly full

Note: the /data0 partition is on the system SSD drive, this one is available all the time and contains the recent data (last 1-2 nights) and logs.

When done with the drives check, unmount them, tell the OS to forget about the SATA devices (it's SATA hot-swap) and finally switch them off:

In case of DFNEXT observatory:

$ python /opt/dfn-software/disable_ext-hd.py

Note: this command actually internally calls also these commands

$ umount /data1 /data2 /data3
$ echo 1 > /sys/block/sdb/device/delete
$ echo 1 > /sys/block/sdc/device/delete
$ echo 1 > /sys/block/sdd/device/delete

so there is no need to run these individually in case of nominal conditions.

In case of DFSMALL observatory:

$ umount /data1 /data2
$ python /opt/dfn-software/disable_ext-hd.py

Installing new HDDs

1. Make sure the hard drives are powered off

Also consider what time it ts - there is daily task to move data from /data0 partition to the removable drives sheduled using crontab for 10:55 local time.

2. Physically replace the drives

Remember labeling the drives taken out of the observatory - put a sticker on the drive, note observatory type, number, site name and replacement date.

3. Format the new drives after replacing

DFNEXT observatory

Power on the enclosure with hard drives and start the formatting script:

$ python /opt/dfn-software/enable_ext-hd.py

wait 20 seconds, then probe the observatory type and HDDs connection type

$ cd /root/bin
$ ./dfn_setup_data_hdds.sh -p

prints

Probe result: DFNEXT SATA /dev/sdb data2 /dev/sdc data1 /dev/sdd data3
Suggested command to format all drives: /root/bin/dfn_setup_data_hdds.sh /dev/sdb data1 /dev/sdc data2 /dev/sdd data3

To format all three drives, execute the suggested command

$ ./dfn_setup_data_hdds.sh /dev/sdb data1 /dev/sdc data2 /dev/sdd data3

Note: The formatting procedure includes SMART selftest of all the drives.

Tell the OS to forget about the SATA devices (it's SATA hot-swap) and power them off.

$ python /opt/dfn-software/disable_ext-hd.py

Note: this command actually internally calls also these commands Note: this command actually internally calls also these commands

$ umount /data1 /data2 /data3
$ echo 1 > /sys/block/sdb/device/delete
$ echo 1 > /sys/block/sdc/device/delete
$ echo 1 > /sys/block/sdd/device/delete

so there is no need to run these individually in case of nominal conditions.

DFNSMALL observatory

Power on the enclosure with hard drives and start the formatting script:

$ python /opt/dfn-software/enable_ext-hd.py 

wait 40 seconds

$ cd /root/bin
$ ./setup_usb_hdds_jmicron.sh

When prompted, for various settings:

- prompt gparted --> N

- prompt "Create partition /dev/sdb1, Format /dev/sdb1 as ext4" --> Y

- prompt "Create partition /dev/sdc1, Format /dev/sdc1 as ext4" --> Y

- wait for quick smart self test to finish, check result for the 1st drive, particularly the following lines:

...
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
...
# 1  Short offline       Completed without error       00%       586         -
... 

- press enter, check result for the 2nd drive the same way

- At the end, check that the freshly formated drives mounted and the expected capacities are listed

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       5.5T   58M  5.2T   1% /data2    ..... this is 6TB drive
/dev/sdc1       3.6T   68M  3.4T   1% /data1    ..... this is 4TB drive

And finally power off the

$ python /opt/dfn-software/disable_ext-hd.py 

Transfer data from data0 to removable HDs manually

$ nohup /usr/local/bin/move_data_files.sh &

This will store info on move in nohup.out file in the present working directory. The data moving task will continue even if you disconnect from the camera.

Check CF card

To check if there are images on the camera CF card:

$ python /opt/dfn-software/enable_camera.py
$ gphoto2 -L -R    # this will list all files on camera

To format the CF card, see the dedicated instruction page here.

Capture control test

1. Run the test

$ /opt/dfn-software/int_control_test.sh

This will take ~10 mins. You should start to hear shutter clicks as test photos are taken.

To monitor interval test as it goes (in other terminal):

$ tail -f /data0/latest/*interval.txt

2. Check interval control test successfully took pictures

Check there are ~10 pictures taken at the time test was run in previous images

$ cd /data0/latest_prev
$ ls

Check Shutter Count

$ cd /data0/latest_prev    
or folder with the format:
$ cd /data0/DFNXXXNN/YYYY/MM/YYYY-MM-DD_DFNXXXNN_1XXXXXXXXX

$ exiv2 -p a **image**.NEF | grep hutter
or
$ grep hutter *interval.txt

Checking GPS

First check GPS lock and position/time information acquired by the GPS and reported to the PC - run command:

$ cgps

See the upper part of screen for position and time. In the bottom part of the screen, the NMEA messages from GPS should be scrolling.

Press [Q] to exit cgps.

Note: No lock means no reception, as long as there is text scrolling in the bottom of the page, the GPS communicates with the observatory PC.

Second thing to check is that the capture control SW hets the leocation and time when it runs with GPS antenna connected and with good signal reception. Inspect the *interval.txt logs, either produced by capture control test or by regular overnight operation. In case of nominal GPS functionality, the ntp NMEA/PPS time correction should be active:

INFO, interval_control_lin, ntp, +SHM(0),.NMEA.,0,l,9,16,377,0.000,-11.851,3.472
INFO, interval_control_lin, ntp, *SHM(1),.PPS.,0,l,9,16,377,0.000,0.022,0.008

and coordinates should be passed from the GPS receiver (the last number '1' means GPS has lock:

INFO, interval_control_lin, GPS_lonlat, 135.274305, -30.857625, 156.26, 1

There is also a python script to query lhe leostick/arduino microcontroller for the GPS status

python /opt/dfn-software/leostick_get_status.py -g
GPGGA,081358.000,3140.0427,S,11639.9456,E,1,16,0.6,195.02,M,-23.9,M,,

The above GPS sentence is example of receiver having lock (bold "1"), coordinates 31 deg 40.0427 min S, 116 deg 39.9456 min E, elevation 195.02 m, while the sentence below shows situation without lock (no reception).

python /opt/dfn-software/leostick_get_status.py -g
GPGGA,045843.000,3102.8703,S,11550.3033,E,0,00,99.0,202.58,M,-27.9,M,,INFO, interval_control_lin

Software Updates

Automated software updates

As long as the observatory is connected to the Internet, the software that controls observatory auto updates daily from a dedicated DFN server. There are two attempts to do so in the afternoon local time, ~ 40 minutes before the daily reboot and ~ 20 minutes after; the default times are 3:30 PM and 4:30 PM for the SW update and 4:10 PM for reboot.

Manual software update over Internet

It might be handy to execute the network SW update manually, for example in case of testing or deploying new observatory that was off-line for some time or in transport or stored as a spare. In this case log in to the observatory and execute command

$ dfn_down_install_sw_from_server.sh 

Manual software update using local copy

In case of remote site without Internet connection the only way to update the observatory software is to bring a copy of the software eg on laptop do the update locally.

The Australian DFN team members can find the latest stable software in the internal DFN repo in operation/SW/dfnsmall/stable (DFNSAMLL type of observatory) or operation/SW/dfnext/stable (DFNEXT type). External collaborators will be provided with a copy of the software on request.

Assuming the servicing person has Linux environment on her/his laptop, first step is to do a dry-run to check there are no command typo errors by running:

$ rsync -nrv opt usr root@172.16.1.101:/

If there are no errors or anomalies, just list of files that would copy, you can run the real update:

$ rsync -rv opt usr root@172.16.1.101:/

Note: The above IP address is for local wired connection (over etherenet cable), for WiFi connection use IP 172.16.0.101

Useful Commands

$ df -h                 # checks if hard drives are mounted - list of mounted disk devices with disk usage/free information
$ lsblk                 # this command lists all hard disk devices in the system and where (if) they are mounted
$ cgps                  # gives gps coordinates in a table if sat lock and monitors communication GPS->PC. Press [Q] to exit.
$ ntpq -p               # check NTP time correction status
$ watch df              # monitors df changes, good for checking data transfers
$ du -hs * | grep G     # will show folders with folders >GB
$ crontab -l            # shows the scheduled tasks.... good for finding commands you want to manually run now