Kai Staats: writing

Lost days

Sometimes I spend an entire day searching for a place to work.

I venture to two or three cafes.

The first is too noisy, with traffic, construction, and trains screaming by.

I order a drink and dessert at the next only to learn the wi-fi is intermittent, at best, twenty minutes lost trying to connect. I do what I can off-line.

The final stop and I find a cozy, relatively quiet corner only to realise there is no power and my laptop has but 12% battery remaining.

I know. It sounds silly to complain about such things, for life is far more dynamic than this. But when what I do requires that I am on-line, days like today are days lost to a modern quagmire.

By |2015-10-06T23:20:45-04:00September 23rd, 2015|Critical Thinker, Humans & Technology|Comments Off on Lost days

GP update 2015 09/13

(email to my fellow researchers)

My classification TEST suite is complete, producing Accuracy, Precision, and Recall scores on associated trees.

I have a very basic evaluation built for the Abs Value (minimization) function. Not really sure what one usually uses to test one of these, other than comparing the distance from the known solution to the produced result.

Spent a few hours dealing with a ‘zoo’ (Pythonic nomenclature for “divide by zero”). Seems SymPy is willing to carry divide-by-zero functions as long as you don’t attempt to process them as a float. Then it freaks out. So, I had to intercept the polynomial processing with a str() test for ‘zoo’.

Anyone ever tried a Google search for “python zoo”?

Finally, I need to apply the sklearn split function across my data. The framework is in place (already modified the way the data is passed through the entire script to accommodate both TRAINING and TEST).

Should be easy. (stupid last words)

kai

By |2017-11-24T23:44:56-04:00September 13th, 2015|Ramblings of a Researcher|Comments Off on GP update 2015 09/13

Normalisation is abnormal

(sitting at AIMS)

From 10 am till 2 pm, this is what I built. Argh! Should not have been so hard.

def fx_normalize(self, array):
norm = []
array_min = np.min(array)
array_max = np.max(array)

for col in range(1, len(array) + 1):
n = float((array[col – 1] – array_min) / (array_max – array_min))
norm = np.append(norm, n)

return norm

… but now, my function appears completely different, the curve of the line gone. Is it just the scale? Yes, further testing confirms this. Good.

Now I will return to my work with Accuracy, Precision, and Recall.

By |2017-11-24T23:45:04-04:00September 10th, 2015|Ramblings of a Researcher|Comments Off on Normalisation is abnormal

Frustration in the easy things

(sitting at SAAO)

Per my work at SKA yesterday, I learned to use matplotlib to produce 3D plots of my functions in combination with a scatterplot of the Iris features. I was hoping to automate the solving for any given variable using Sympy, but have not found a means to that end. For now, I will manually reform each algebraic expression. Not ideal, but I need to move ahead. Spent too much time on this already.

I recognise that plotting is core to any modern research. I feel far behind, but know I will come up to speed quickly. Between the older gnuplot, matplotlib, and sympy’s plot functions, there are myriad approaches (too many, in fact).

As I have many times experienced over the past year, every day I am humbled by the challenge of learning something new, and at the same time rewarded by the same. Each day feels incremental, but when I look back to my very first line of Python a little over a year ago, and now, over 2500 lines of Object Oriented, multi-core code with a home-built Numpy array management system, yeah, I’ve learned a ton.

By |2017-11-24T23:45:11-04:00September 9th, 2015|Ramblings of a Researcher|Comments Off on Frustration in the easy things

My brain barrier

I pushed too hard, tried to do too many things at once. I felt the barrier coming, but ignored it, trying to make one more breakthrough in my code.

Last night and today I am learning how to plot multi-dimensional data. In practice, this is very simple. But this has been a real struggle for me.

I promised myself that in this Masters, I would not take any shortcuts, that what I learn would be fully integrated into my understanding. Due to my lack of a mathematical science foundation, some of the core principals of data reduction, statistical application, and multi-dimensional visualisation are totally new to me.

What’s more, my anxiety literally heats my brain to a point of dysfunction, the back of my neck feels like someone is pouring hot syrup down my spine (since 2011). When I feel stupid, when I can’t figure something out, and when the pressure is on … I spiral.

I feel as though I lost a full day (which I can’t afford to do).

On the walk to the train station I asked Arun a number of questions to clarify my understanding. The physical act of walking, gesturing with my arms and hands helped tremendously. Arun is always patient, and an excellent explainer. A few key concepts became clear and it fell into place for me.

By |2017-11-24T23:52:30-04:00September 8th, 2015|Ramblings of a Researcher|Comments Off on My brain barrier

Return to Floating point hell

(sitting at SKA again)

No choice but to return to floating points again as once in every several thousand trees, the fitness score stored has expanded beyond the set 4 floating point places, resulting in something like n.123456e (scientific notation).

I have searched on-line, dug deep into Stackoverflow, and asked my colleagues. No one has a means by which I can define a variable as a fixed number of floating points (to the right of the decimal) that stays that way. Ugh!

I therefore introduced a second round function in the Tournament method, to force the numbers back into spec. I have not since experienced the same error. This is not the ideal solution as the round function introduces its own issues, but for the sake of GP, it does the trick.

I then approached the issue of plotting 2D and 3D data, but hit a major mental block. Like an old engine with leaky coolant, my brain overheated and I had a bad day. Sorry Arun, I was really grumpy.

By |2017-11-24T23:52:37-04:00September 8th, 2015|Ramblings of a Researcher|Comments Off on Return to Floating point hell

Flowers for Algebra

(sitting at SKA)

It is my intent to complete the classification test suite today, using the benchmark Iris dataset.

The Iris dataset offers 4 features (columns in a .csv) for each of 3 unique plant species, as labelled in the right-most (5th) column. As 2 of these species are not discernible from each other with a linear function, there are 2 ways to approach this problem:

  1. Compare only 2 species at a time (in a round-robin) such that we work with A/B, A/C, and finally B/C; or
  2. Seek a non-linear function with a kernel which supports more than 2 classes at a time

Karoo GP - by Kai Staats

As Karoo GP supports only 2 classes at this time, I am going to work with the first option, and come back to the second when I have time to build a new kernel.

If it is our intent to plot the results, we can engage only 3 dimensions of data at one (without forcing higher dimensions into lower orders). However, the Iris dataset is compose of 4 features (a, b, c and d). As whatever holds true in lower dimensions will be retained in higher dimensions, it is safe to test a lower order of the featureset as long as the features we selected are decisive in the identification of the flower species. As such that we remove column d from all 3 comparisons.

This resulted in what appears to be a fully functional classification by Karoo GP, with both linear and non-linear functions that produce 100% (50/50) correct classifications.

By |2017-11-24T23:52:45-04:00September 7th, 2015|Ramblings of a Researcher|Comments Off on Flowers for Algebra

Kepler’s Law resolved with GP!

Kepler's Law resolved (Karoo GP, Kai Staats)

(Kepler’s 3rd Law of Planetary Motion table by the Physics Classroom)

(continued from Premature convergence)

Working from AIMS and my apartment these past two days, I was able to resolve a persistent floating point issue by employing a round function before the fitness evaluation.

I also fixed the minimisation function with the discovery of 2 copy/pasted lines of code I had apparently failed to come back to. It appears this has not been working for some time, as I have been focused on other aspects of the code.

Finally! Just like that! Karoo GP now resolves Kepler’s 3rd Law of Planetary Motion! YES!!!

I ran it with the default Depth 3 and minimum node count of 3 and again, it came up with t/t = 1. So I ran it again with Depth 5 [2^(d+1)-1 = 63 possible nodes] and minimum node count of 9.

While it struggled for the first 5-6 generations, converging on what appeared to be 1 again, some mutation gave it the correct answer and in just 3 generations the correct trees dominated! Coooool!

{1: t**2/r**3,
2: t**2/r**3,
3: t**2/r**3,

88: t**2/r**3,
92: t**2/r**3,
94: t**2/r**3}

This proves 2 of the 3 desired functions: regression maximisation (match) and regression minimisation. Only classification remains to prove Karoo GP road worthy.

By |2017-11-24T23:52:52-04:00September 6th, 2015|Ramblings of a Researcher|Comments Off on Kepler’s Law resolved with GP!

Smashing bugs

The past 3 days have been a fun dive back into my code. I discovered my code no longer worked with cos and sin, the matching fitness function (result = solution) failed with floats, and my minimalisation function was selecting the final Tree, not the best. Ugh!

The float issue all programmers deal with, namely, forcing all variables which are compared against each other into the same number of positions to the right of the decimal point. FIXED!

The cos and sin were related to the float issue. FIXED!

The minimalisation function was my own damned fault, as in my Tournament Selection I had copy/pasted a body of code with intent to them rework it from maximising to minimising, but got distracted, forgot (3-4 weeks ago?) and only tonight realised what was happening. FIXED!

Here are my first results from the working minimalisation function. What remains is the automated selection of the best of the best, not just a list of the leaders in the final generation. Easily done.

Desired result:
a*b + c where 1*10 + 0.05 = 10.05

Trial 1:
a*b + c – c**3/b where 1*10 + 0.05 – (0.05^3/10) = 9.64

Trial 2:
a*b + c + c/b where 1*10 + 0.05 + 0.05/10 = 10.055
a*b where 1*10 = 10.00
a*b + c where 1*10 + 0.05 = 10.05 is CORRECT
a*b + c + c/a where 1*10 + 0.05 + (0.05/1) = 10.10

Trial 3:
a*b + c where 1*10 + 0.05 = 10.05 is CORRECT
a*b + 1/b where 1*10 + 1/10 = 10.10

The above test works perfectly. Now, I have only to test the Iris classification set and I will have all 3 fitness functions fully tested and working.

I am roughly 5 weeks behind sched, but believe I can catch-up as I my code base is solid and designed for the rapid introduction of new fitness functions. In theory, this goes fairly smooth after I prove the base works (why do I keep saying this?!)

Cool!

By |2017-11-24T23:53:01-04:00September 6th, 2015|Ramblings of a Researcher|Comments Off on Smashing bugs

Floating points

In a moment of needing a break from working on the User Guide I played with Karoo GP for 30 minutes and came back to testing cos and sin functions. They broke the Absolute Value function where Classification and Matching yet worked.

A day into the battle, a deeper issue has unfolded which keeps Karoo GP from working with any floats (although this was tested a long time ago, but with a very limited and controlled number of decimal places).

Now, I have learned even when [result = solution], it fails to match.

algo_sym a + b/c
result 0.44404973357
solution 0.4440497336

algo_sym a + b/c
result 0.690166666667
solution 0.6901666667

algo_sym a + b/c
result 2.70666517168
solution 2.7066651717

ARGH!!! Rounding errors!!!

I need to introduce careful control of the number of decimal places invoked for both ‘result’ and ‘solution’. For as much headache as this has caused, I am pleased I took that break and played around with my code again, else I would have run into this when I came back to KAT7 data.

What’s more, this might help Karoo GP solve the Kepler planet problem (which still fails at, miserably).

By |2017-11-24T23:53:08-04:00September 5th, 2015|Ramblings of a Researcher|Comments Off on Floating points
Go to Top