Ramblings of a Researcher

The Circle of Life

Toward the back of a recent issue of Scientific American, I was totally engrossed by a brief discussion of the “Circle of Life” from the point of view of biology. Every known species (2.3 million and growing) is included in the count (inner circle) with a projection for the balance of types of life, as we discover more, in the outer circle.

Perhaps what I love most is the understanding that we know so little, and are projecting our own lack of knowledge as a kind of map for what we desire, and will some day learn.

By Kai Staats|2017-08-05T19:11:52-04:00March 3rd, 2016|Ramblings of a Researcher|Comments Off

The long wait …

At long last, my draft thesis has been submitted.

Now, I wait for feedback …

By Kai Staats|2017-08-05T19:12:01-04:00February 10th, 2016|Ramblings of a Researcher|Comments Off

Good News for Bad News Days

While living in Cape Town, South Africa for the past two years, I came to crave my 7km barefoot runs on the beach, surfing in the cool, early morning waves of False Bay, breakfast of fresh, locally grown organic veggies and hand-picked eggs, and a half hour reading the good news of the day.

In a world filled with news of local political corruption and national debt, gang fights and robberies, ISIS and the North Korean threat, and increasing violence in Palestine and Israel, I long for something to remind me that our species is not as sinister as we seemingly demonstrate.

For me, scientific research and discovery is much needed good news, a human craving for knowledge and expression of creativity that knows no bounds. Science, Scientific American, New Scientist, National Geographic, –they offer stories of teams that are working to solve some of our greatest challenges. Yes, many of the stories begin with a description of a dire situation–global warming, browning waters, fisheries on the brink of collapse, energy production that poisons our atmosphere, and the spread of deadly disease. But each issue is met with deeper insight to the problem and often a means to counter pending catastrophe. Even more stories are about pure discovery, made by those who desire to know how the world works in intimate detail.

We peer inside the human brain to address our behaviour. We follow the migration of wild game to learn how to help keep ecosystems in balance. We study ancient relics to learn what we once knew, but have long since forgotten. We look to the dark corners of our solar system in search of the origin of life and to the very beginning of time to determine if this is the only universe, or one of many which co-exist.

“The hole wide multiverse”
“A 10-minute rest can boost memory like sleep”
“Farting plants kick up a stick if irked”
“Narwhal nurseries spotted”
“Math whizzes of ancient Babylon figured out forerunner of calculus”
“Tegu lizards get body heat boost during mating season”
“Computer that mimics human brain beats professional at game of Go”

In New Scientist, issue Jan 9-15, 2016, a story of Alexander Graham Bell in 1880, when he built a photophone, a device that uses light to transmit sound, has him saying, “I have heard articular speech by sunlight! I have heard a ray of sun laugh and cough and sing! I have been able to hear a shadow and I have even perceived by ear the passage of a cloud across the sun’s disk!” The inventor of the telephone, whose namesake yet lives on, wrote in poetic form the exuberance of his discovery and invention.

When we allow ourselves to see the world through the eyes of a child, we once again take on that child-like form. We celebrate what we learn not because it elevates us as individuals, to gain fame, wealth, or power (for those are the burdens of the adult world) but because it opens our minds to what we do not know, and how much more of the mystery remains for us to unravel.

By Kai Staats|2017-08-05T19:12:23-04:00January 29th, 2016|Critical Thinker, Ramblings of a Researcher|Comments Off

When research comes to an end …

I have since my return to the States five weeks ago been preparing my MSc thesis for submission to UCT. 114 pages. 40 citations. 20 images. Three weeks to go … and still so much to do. Nearly every day I engage. 2 hours, 4 hours … 14 hours. It is a process I enjoy far more than I thought possible, for the exacting attention to every detail is wonderfully consuming.

Running, hiking, yoga, bread baking, tending the fire at my aunt’s home in Tucson are what I do between the hours I am writing and editing. As when I was developing Karoo GP, I wake, breathe, and sleep my thesis.

This is the making of a scientist. No fact goes to print without evidence of its origin, either in previously published works, or my own, validated research. No statement is personal. This is not about me, but what was discovered about the arena in which I laboured to better understand.

I was twenty months in South Africa, twenty two months in this program. I attended a dozen workshops and conferences in South Africa, Namibia, and Spain in order to broaden my skills and deepen my knowledge, to learn how to begin to understand machine learning as it can be applied to radio astronomy. Countless thousands of pages of literature reviewed, thousands of lines of code written, and hundreds of hours spent in development, data runs, and analysis.

In the end, it comes down to just two numbers, Precision and Recall, to determine if my work was a success.

That is … incredible!

By Kai Staats|2017-08-05T19:12:30-04:00January 6th, 2016|Ramblings of a Researcher|Comments Off

GP update 2015 10/02

(email to my fellow researchers)

This week has come and gone quickly. I have been to SKA four times, including a week ago Friday. The pace of my work differs now, as my data runs are a minimum of 5-6 hours, but more recently 50+ hours as I push GP to 50 generations of 100-200 Trees against 10,000 lines of features.

I arrive. Log on. Run each accomplished tree against the TEST data. Save the results in my diary. Archive the trees. Mod parameters. Start a new run. Two hours in commute for an hour of work. Sounds like living in San Fran, not Muizenberg.

So far, very good! No glitches in my software. Not a single crash (except when Nadeem accidentally killed Karoo GP 35 gens into a 50 gen run, trying to kill zombies at my request. Silly us! You can’t kill zombies, they’re already dead! I know, old UNIX joke, but it’s still funny :) The multi-core is solid and linear scaling on the 40 core box. The server version (configuration file + single line execution) works well for repeat runs.

I have conducted four full runs, with the fifth now in progress. Keeping a diary of the results, including the Precision / Recall against the TREE ID and it’s polynomial expression. What’s more, every tree is saved in a .csv file at the end of each Generation. Even when Karoo was terminated accidentally, nothing lost.

Now, I need to write a script which loads a .csv and runs with it, as a total population seed (common according to the literature). The continue function is already in place, so just need to slip a loaded list of arrays into population_a and cont.

Consistently, I am seeing 82-86% Precision (in a 50/50 dual class feature set) with Recall just a few points below. I need to look at AUC and one other analysis (rcm by Thuso; can’t recall the name) to get a full understanding of how Karoo is doing.

Ok. Back to work …

By Kai Staats|2017-11-24T23:44:43-04:00October 2nd, 2015|Ramblings of a Researcher|Comments Off

GP update 2015 09/25

(email to my fellow researchers)

Today marked the first official day of Karoo GP processing KAT7 data.

My first run was with depth 5 trees against 10,000 lines of data with 5 features. The multi-core functionality saved my recursive ass as the first 10 generations of 100 trees took just over 5 hours to process.

In the end of this minimisation function, there were 3 trees presented as having the best fitness, 2 of which shared the same polynomial expression. I think that is a good sign, but not certain yet.

Precision was 86% for both. Recall quite a bit less.

I sent the first run back into another 20 generations (a new feature I added to Karoo GP this week which allows you to continue the evolution indefinitely), and started another run with the same settings, to see if it converges on anything close to the first set of equations.

Will find out on Monday …

kai

By Kai Staats|2017-11-24T23:44:49-04:00September 25th, 2015|Ramblings of a Researcher|Comments Off

GP update 2015 09/13

(email to my fellow researchers)

My classification TEST suite is complete, producing Accuracy, Precision, and Recall scores on associated trees.

I have a very basic evaluation built for the Abs Value (minimization) function. Not really sure what one usually uses to test one of these, other than comparing the distance from the known solution to the produced result.

Spent a few hours dealing with a ‘zoo’ (Pythonic nomenclature for “divide by zero”). Seems SymPy is willing to carry divide-by-zero functions as long as you don’t attempt to process them as a float. Then it freaks out. So, I had to intercept the polynomial processing with a str() test for ‘zoo’.

Anyone ever tried a Google search for “python zoo”?

Finally, I need to apply the sklearn split function across my data. The framework is in place (already modified the way the data is passed through the entire script to accommodate both TRAINING and TEST).

Should be easy. (stupid last words)

kai

By Kai Staats|2017-11-24T23:44:56-04:00September 13th, 2015|Ramblings of a Researcher|Comments Off

Normalisation is abnormal

(sitting at AIMS)

From 10 am till 2 pm, this is what I built. Argh! Should not have been so hard.

def fx_normalize(self, array):
norm = []
array_min = np.min(array)
array_max = np.max(array)

for col in range(1, len(array) + 1):
n = float((array[col – 1] – array_min) / (array_max – array_min))
norm = np.append(norm, n)

return norm

… but now, my function appears completely different, the curve of the line gone. Is it just the scale? Yes, further testing confirms this. Good.

Now I will return to my work with Accuracy, Precision, and Recall.

By Kai Staats|2017-11-24T23:45:04-04:00September 10th, 2015|Ramblings of a Researcher|Comments Off

Frustration in the easy things

(sitting at SAAO)

Per my work at SKA yesterday, I learned to use matplotlib to produce 3D plots of my functions in combination with a scatterplot of the Iris features. I was hoping to automate the solving for any given variable using Sympy, but have not found a means to that end. For now, I will manually reform each algebraic expression. Not ideal, but I need to move ahead. Spent too much time on this already.

I recognise that plotting is core to any modern research. I feel far behind, but know I will come up to speed quickly. Between the older gnuplot, matplotlib, and sympy’s plot functions, there are myriad approaches (too many, in fact).

As I have many times experienced over the past year, every day I am humbled by the challenge of learning something new, and at the same time rewarded by the same. Each day feels incremental, but when I look back to my very first line of Python a little over a year ago, and now, over 2500 lines of Object Oriented, multi-core code with a home-built Numpy array management system, yeah, I’ve learned a ton.

By Kai Staats|2017-11-24T23:45:11-04:00September 9th, 2015|Ramblings of a Researcher|Comments Off

My brain barrier

I pushed too hard, tried to do too many things at once. I felt the barrier coming, but ignored it, trying to make one more breakthrough in my code.

Last night and today I am learning how to plot multi-dimensional data. In practice, this is very simple. But this has been a real struggle for me.

I promised myself that in this Masters, I would not take any shortcuts, that what I learn would be fully integrated into my understanding. Due to my lack of a mathematical science foundation, some of the core principals of data reduction, statistical application, and multi-dimensional visualisation are totally new to me.

What’s more, my anxiety literally heats my brain to a point of dysfunction, the back of my neck feels like someone is pouring hot syrup down my spine (since 2011). When I feel stupid, when I can’t figure something out, and when the pressure is on … I spiral.

I feel as though I lost a full day (which I can’t afford to do).

On the walk to the train station I asked Arun a number of questions to clarify my understanding. The physical act of walking, gesturing with my arms and hands helped tremendously. Arun is always patient, and an excellent explainer. A few key concepts became clear and it fell into place for me.

By Kai Staats|2017-11-24T23:52:30-04:00September 8th, 2015|Ramblings of a Researcher|Comments Off