A more recent and editorialized version of this post can be found here.
In the course of my work I had to detect filters in FBC (Full Band Capture) downstream spectrum scans. These spectrum scans may contain impairments which need to be detected. Usually, the operator implements a filter on purpose, to simply disable part of the spectrum. Example: Maybe there is a known impairment in the neighborhood that occurs whenever the cable is used close to its full capacity. Filtering part of the spectrum decreases the maximum possible bandwidth, but fixes the issue. Now, a filter is not exactly an impairment. Still, detecting it is useful. Since the shape is very pronounced, it is also not too difficult to achieve. Thus, let's take a look.
In the following the algorithm for detecting filters is presented. Then, it is shown that using a tanh is appropriate for representing a filter. Finally, an idea for future work is noted. The filter algorithm works as follows. First, a definition of tanh is needed. Since tanh is one of the trigonometric functions, the same transformation rules apply.
def tanh(self, x, a, b, c, d): """A tanh. Parameters: a: Amplitude. Height. b: Phase/horizontal shift. c: Period. How flat/straight up is the tanh? d: Vertical shift. """ return a * (1 + np.tanh((x - b) / c)) + d
Now, using scipy's curve_fit and supplying a few carefully crafted initial parameters for a, b, c and d (simply done by fitting a curve on a few scans with known filter impairments and using the average as the initial guess), a tanh can be fitted on a given sample. Then, the resulting tanh (its parameters) and the fitting error can be compared to a hard coded constant. In this case, this would be the initial guess extended by a range; e.g. the parameter c could then be in the range 80 to 150 to be a tanh that represents a filter impairment.
In this example, the following parameters and ranges make sense:
# The parameters a, b, c, d for changing the shape of a tanh function. # See the tanh function above for detailed information on each parameter. # These magic numbers were found empirically. Refer to the unit tests. GUESSED_PARAMS = [23., 367., 128., -61.] # Allowed deviation in the parameters of the tanh function and the guessed # initial one, in both directions. For example, parameter a is in the range # [17,29]. MAX_DEVIATION_PER_PARAM = np.array([6, 80, 70, 20])
To validate this algorithm, a clustering on 200000 scans was done using sklearn's MiniBatchKmeans and partially fitting 10000 samples at once. This results in roughly 2GB RAM usage compared to over 64GB (unknown how much was needed as it crashed) when using vanilla Kmeans.
Assuming 12 clusters (more ideas were tried – 12 seems roughly the best) the following cluster center for filters can be observed:
A small dataset of 208 samples was drawn from scans that the given MiniBatchKmeans model predicts to be in the above cluster. This dataset was used for adapting the algorithm (to support steeper filters) and added as a unittest.
Drilling down, it can be seen that most filters have this form, which can be represented by a tanh.
Unfortunately it seems the above cluster still contains a few false positives – a few samples are not filters at all. To make this exact, the 12 clusters would need adjusting. This could be future work. However, for the purpose of creating a small dataset of filters, this is irrelevant.