Manual Maintenance

From DFN Wiki
Revision as of 13:11, 27 April 2020 by Martin Cupak (talk | contribs) (add sections Configure timeozne and Reset/Reboot)
Jump to: navigation, search

Connecting to your camera system

If you are ever required to perform any manual maintenance, first connect to your camera.

If you are unable to connect to the camera system locally via the network (ethernet or WiFi), or remotely over the VPN, you can log in locally using a HDMI or VGA monitor and USB keyboard.

Direct Connection

  1. Before turning on system, connect a keyboard using USB and a monitor using either HDMI or VGA *insert figure of where to connect*
  2. log in as root

Checking removable hard drives

General notes:

  • Before deploying cameras the drives should be empty - delete the data recorded eg during testing in the lab, from all drives including /data0
  • It is a good practice to check the how the drives are full before replacing them when servicing the observatory. If it was running for several months, at least /data1 should be full. If that is not true, the observatory was not working and needs to be serviced. Also you can decide to replace only the drives that actually have some data on them and leave the empty ones in the box.

Now let's start - first switch the drives on and mount them:

$ python /opt/dfn-software/enable_ext-hd.py

In case of DFNEXT observatories, wait at least 20 seconds fro the drives to spin up. then mount the drives:

$ mount /data1
$ mount /data2
$ mount /data3

In case of DFNSMALL observatories, wait at least 40 seconds fro the system to recognize the USB enclosure and spin up the drives and then mount them:

$ mount -a

The next step is to list the drives - we are interested only in the data partitions:

$ df -h 

In case of DFNEXT observatory with three 6TB removable drives installed and running for several weeks, you will get listing like this:

Filesystem      Size  Used Avail Use% Mounted on
...
/dev/sda3       390G   55G  316G  15% /data0
/dev/sdb1       5.5T  1.1T  4.2T  21% /data1    ..... This drive is 21% full
/dev/sdd1       5.5T   58M  5.2T   1% /data2    ..... This drive is empty
/dev/sdc1       5.5T   89M  5.2T   1% /data3    ..... This drive is empty

In case of DFSMALL observatory with two 8TB removable drives installed and runnin $ cd /data0/DFNXXXNN/YYYY/MM/g for more than 1/2 year, now pretty much full of data, you will see:

Filesystem      Size  Used Avail Use% Mounted on
...
/dev/sda5       406G   59G  327G  16% /data0
/dev/sdc1       7.3T  6.8T   90G  99% /data1    ..... This drive is full
/dev/sdb1       7.3T  6.5T  367G  95% /data2    ..... This drive is nearly full

Note: the /data0 partition is on the system SSD drive, this one is available all the time and contains the recent data (last 1-2 nights) and logs.

When done with the drives check, unmount them, tell the OS to forget about the SATA devices (it's SATA hot-swap) and finally switch them off:

In case of DFNEXT observatory:

$ python /opt/dfn-software/disable_ext-hd.py

Note: this command actually internally calls also these commands

$ umount /data1 /data2 /data3
$ echo 1 > /sys/block/sdb/device/delete
$ echo 1 > /sys/block/sdc/device/delete
$ echo 1 > /sys/block/sdd/device/delete

so there is no need to run these individually in case of nominal conditions.

In case of DFSMALL observatory:

$ umount /data1 /data2
$ python /opt/dfn-software/disable_ext-hd.py

Installing new HDDs

1. Make sure the hard drives are powered off

Also consider what time it ts - there is daily task to move data from /data0 partition to the removable drives sheduled using crontab for 10:55 local time.

2. Physically replace the drives

Remember labeling the drives taken out of the observatory - put a sticker on the drive, note observatory type, number, site name and replacement date.

3. Format the new drives after replacing

DFNEXT observatory

Power on the enclosure with hard drives and start the formatting script:

$ python /opt/dfn-software/enable_ext-hd.py

wait 20 seconds, then probe the observatory type and HDDs connection type

$ cd /root/bin
$ ./dfn_setup_data_hdds.sh -p

prints

Probe result: DFNEXT SATA /dev/sdb data2 /dev/sdc data1 /dev/sdd data3
Suggested command to format all drives: /root/bin/dfn_setup_data_hdds.sh /dev/sdb data1 /dev/sdc data2 /dev/sdd data3

To format all three drives, execute the suggested command

$ ./dfn_setup_data_hdds.sh /dev/sdb data1 /dev/sdc data2 /dev/sdd data3

Note: The formatting procedure includes SMART selftest of all the drives.

Tell the OS to forget about the SATA devices (it's SATA hot-swap) and power them off.

$ python /opt/dfn-software/disable_ext-hd.py

Note: this command actually internally calls also these commands Note: this command actually internally calls also these commands

$ umount /data1 /data2 /data3
$ echo 1 > /sys/block/sdb/device/delete
$ echo 1 > /sys/block/sdc/device/delete
$ echo 1 > /sys/block/sdd/device/delete

so there is no need to run these individually in case of nominal conditions.

DFNSMALL observatory

Power on the enclosure with hard drives and start the formatting script:

$ python /opt/dfn-software/enable_ext-hd.py 

wait 40 seconds

$ cd /root/bin
$ ./setup_usb_hdds_jmicron.sh

When prompted, for various settings:

- prompt gparted --> N

- prompt "Create partition /dev/sdb1, Format /dev/sdb1 as ext4" --> Y

- prompt "Create partition /dev/sdc1, Format /dev/sdc1 as ext4" --> Y

- wait for quick smart self test to finish, check result for the 1st drive, particularly the following lines:

...
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
...
# 1  Short offline       Completed without error       00%       586         -
... 

- press enter, check result for the 2nd drive the same way

- At the end, check that the freshly formated drives mounted and the expected capacities are listed

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       5.5T   58M  5.2T   1% /data2    ..... this is 6TB drive
/dev/sdc1       3.6T   68M  3.4T   1% /data1    ..... this is 4TB drive

And finally power off the

$ python /opt/dfn-software/disable_ext-hd.py 

Transfer data from data0 to removable HDs manually

$ nohup /usr/local/bin/move_data_files.sh &

This will store info on move in nohup.out file in the present working directory. The data moving task will continue even if you disconnect from the camera.

Check CF card

To check if there are images on the camera CF card:

$ python /opt/dfn-software/enable_camera.py
$ gphoto2 -L -R    # this will list all files on camera

To format the CF card, see the dedicated instruction page here.

Capture control test

1. Run the test

$ /opt/dfn-software/int_control_test.sh

This will take ~10 mins. You should start to hear shutter clicks as test photos are taken.

To monitor interval test as it goes (in other terminal):

$ tail -f /data0/latest/*interval.txt

2. Check interval control test successfully took pictures

Check there are ~10 pictures taken at the time test was run in previous images

$ cd /data0/latest_prev
$ ls

Configure timeozne

This command starts text GUI app that allows to set the timesone.

$ dpkg-reconfigure tzdata

Note: it is not recommended to change the timezone during the night time when the exposure capture control SW is running. I is recommended to restart the OS of the camara system embedded PC after changing the timezone. Warm reboot is OK.

Reset/Reboot the camera system

This command initiates warm reset:

$ reboot

Note: In case of DFNKIT camera systems, this is the only available option for reboot. Full HW (cold) reset including the microcontroller and DSLR needs power cycling, ideally power off at the time when a beep sounds.

This command initiates cold reset in case of DFNSMALL camera systems (including the microcontroller, DSLR and video camera):

$ hard_reset.sh

This command initiates cold reset in case of DFNEXT camera systems (including the microcontroller, DSLR and video camera):

$ poweroff

Check Shutter Count

$ cd /data0/latest_prev    
or folder with the format:
$ cd /data0/DFNXXXNN/YYYY/MM/YYYY-MM-DD_DFNXXXNN_1XXXXXXXXX

$ exiv2 -p a **image**.NEF | grep hutter
or
$ grep hutter *interval.txt

Checking GPS

First check GPS lock and position/time information acquired by the GPS and reported to the PC - run command:

$ cgps

See the upper part of screen for position and time. In the bottom part of the screen, the NMEA messages from GPS should be scrolling.

Press [Q] to exit cgps.

Note: No lock means no reception, as long as there is text scrolling in the bottom of the page, the GPS communicates with the observatory PC.

Second thing to check is that the capture control SW hets the leocation and time when it runs with GPS antenna connected and with good signal reception. Inspect the *interval.txt logs, either produced by capture control test or by regular overnight operation. In case of nominal GPS functionality, the ntp NMEA/PPS time correction should be active:

INFO, interval_control_lin, ntp, +SHM(0),.NMEA.,0,l,9,16,377,0.000,-11.851,3.472
INFO, interval_control_lin, ntp, *SHM(1),.PPS.,0,l,9,16,377,0.000,0.022,0.008

and coordinates should be passed from the GPS receiver (the last number '1' means GPS has lock:

INFO, interval_control_lin, GPS_lonlat, 135.274305, -30.857625, 156.26, 1

There is also a python script to query lhe leostick/arduino microcontroller for the GPS status

python /opt/dfn-software/leostick_get_status.py -g
GPGGA,081358.000,3140.0427,S,11639.9456,E,1,16,0.6,195.02,M,-23.9,M,,

The above GPS sentence is example of receiver having lock (bold "1"), coordinates 31 deg 40.0427 min S, 116 deg 39.9456 min E, elevation 195.02 m, while the sentence below shows situation without lock (no reception).

python /opt/dfn-software/leostick_get_status.py -g
GPGGA,045843.000,3102.8703,S,11550.3033,E,0,00,99.0,202.58,M,-27.9,M,,INFO, interval_control_lin

Software Updates

Automated software updates

As long as the observatory is connected to the Internet, the software that controls observatory auto updates daily from a dedicated DFN server. There are two attempts to do so in the afternoon local time, ~ 40 minutes before the daily reboot and ~ 20 minutes after; the default times are 3:30 PM and 4:30 PM for the SW update and 4:10 PM for reboot.

Manual software update over Internet

It might be handy to execute the network SW update manually, for example in case of testing or deploying new observatory that was off-line for some time or in transport or stored as a spare. In this case log in to the observatory and execute command

$ dfn_down_install_sw_from_server.sh 

Manual software update using local copy

In case of remote site without Internet connection the only way to update the observatory software is to bring a copy of the software eg on laptop do the update locally.

The Australian DFN team members can find the latest stable software in the internal DFN repo in operation/SW/dfnsmall/stable (DFNSAMLL type of observatory) or operation/SW/dfnext/stable (DFNEXT type). External collaborators will be provided with a copy of the software on request.

Assuming the servicing person has Linux environment on her/his laptop, first step is to do a dry-run to check there are no command typo errors by running:

$ rsync -nrv opt usr root@172.16.1.101:/

If there are no errors or anomalies, just list of files that would copy, you can run the real update:

$ rsync -rv opt usr root@172.16.1.101:/

Note: The above IP address is for local wired connection (over etherenet cable), for WiFi connection use IP 172.16.0.101

Replacing embedded PC board

Commell LE-37D (DFNSMALL, DFNKIT)

When replacing the PC board, the system drive (mSATA SSD card) and optionally mobile network 3G/4G modem card is moved from the old PC to the new one.

Ethernet ports

The Ethernet networking will not work out of the box, because the udev subsystem automatically pairs the network interfaces (eth0, eth1) with unique MAC addresses for each board and remembers that setting. With the system drive in new PC board, udev recognizes the new network interfaces, but names them eth2 and eth3, which does not match with the network configuration.

To fix that, one needs to clear that configuration: PC needs to be booted with screen (we use cheap upcycled VGA panels in our lab most of the time but HDMI should work as well) and keyboard connected to be able to run script /root/bin/rm_udev.sh and then reboot.

Note: when the system is first booted and later a monitor connected, it most likely will not work. Keyboard should be no problem including connecting it later for a blind CTRL+ALT+DEL reboot.

Note: the udev network interface persistent config does not affect the mobile network 3G/4G modem card interface, however, the user needs to make sure the operator and modem type setting is correct for the current location.

BIOS configuration

If using brand new unconfigured PC board or if the CMOS battery on the new PC board is replaced or if in case of unexpected camera system behaviour, make sure the BIOS configuration is correct.

Useful Commands

$ df -h                 # checks if hard drives are mounted - list of mounted disk devices with disk usage/free information
$ lsblk                 # this command lists all hard disk devices in the system and where (if) they are mounted
$ cgps                  # gives gps coordinates in a table if sat lock and monitors communication GPS->PC. Press [Q] to exit.
$ ntpq -p               # check NTP time correction status
$ watch df              # monitors df changes, good for checking data transfers
$ du -hs * | grep G     # will show folders with folders >GB
$ crontab -l            # shows the scheduled tasks.... good for finding commands you want to manually run now