Ramblings of a Researcher

Home/Ramblings of a Researcher

Our Full Potential

(2016 04/??)

Today I gathered my parents for a review of the code I produce for my MSc research, a Genetic Programming platform designed to work with any prepared .csv file, no matter the user’s level of experience in Python or Machine Learning.

Over the course of an hour I successfully explained how so much of the world, even the greater cosmos can be explained through mathematical functions. Some simple. Some extremely complicated. But all of them, that is, the ones that truly express the inner workings of the cosmos are elegant in form and function. They are beautiful.

When it came to my code, 3000+ lines of Object Oriented Python, there was a moment’s hesitation when I recall that very first line of code, the very first hesitant definition of a variable and function when I thought I’d have the basic code running in a few hundred lines, not thousands; over the course of six weeks, not six months.

In the telling of that story, in the explanation of what I had accomplished, there was very little ego or expression, rather a pure joy for the process of discovery. I was proud not of what may hands accomplished, for I did not invent Genetic Programming, but for the means by which I can now explore the world around me with the vehicle I had built.

I imagine the joy of a geologist is similar, seeing rock layers through the eyes of time and pressure. In the same way, on a much smaller scale, I was challenged to bring this code to life, to allow me to see patterns that tell their own story much as solidified layers of drifting sand, quartz, calcite, and igneous flows tell the story of what happened hundreds of millions of years ago.

I can say that six months of programming was the most mentally challenging thing I have ever done. While the mathematics were relatively simple, the implementation was often arduous. I discovered a new capacity for problem solving that goes beyond my former work in designing supercomputers or a 2000 package operating system, beyond the intrinsic risk / reward of running a for-profit enterprise when every large contract presents a do-or-die situation.

Now, I wonder, have I short-changed my own potential? Not in some kind of ego stroke, but in a very real, “What else am I capable of? What more can I do that I would have otherwise thought impossible?” How many of us truly engage our full potential? With concern for funding, bills, relationships, family, and physical well being, the times in our modern lives in which we are enabled to just think, brainstorm, and solve problems is truly but a minor fraction of our waking hours.

What a shame. What a waste of resources when so much of our world, so much of all our daily, living, breathing, working hours are spent on the day-to-day operations of just getting by. Who out there, who among the myriad humans on this planet has ever been given the challenge and reward of fully using his or her innate ability to solve problems … and indulge in the total bliss of discovery?

By |2020-08-15T13:49:43-04:00April 24th, 2016|Ramblings of a Researcher|Comments Off on Our Full Potential

The Circle of Life

Circle of Life by Scientific American

Toward the back of a recent issue of Scientific American, I was totally engrossed by a brief discussion of the “Circle of Life” from the point of view of biology. Every known species (2.3 million and growing) is included in the count (inner circle) with a projection for the balance of types of life, as we discover more, in the outer circle.

Perhaps what I love most is the understanding that we know so little, and are projecting our own lack of knowledge as a kind of map for what we desire, and will some day learn.

By |2017-08-05T19:11:52-04:00March 3rd, 2016|Ramblings of a Researcher|Comments Off on The Circle of Life

Good News for Bad News Days

While living in Cape Town, South Africa for the past two years, I came to crave my 7km barefoot runs on the beach, surfing in the cool, early morning waves of False Bay, breakfast of fresh, locally grown organic veggies and hand-picked eggs, and a half hour reading the good news of the day.

In a world filled with news of local political corruption and national debt, gang fights and robberies, ISIS and the North Korean threat, and increasing violence in Palestine and Israel, I long for something to remind me that our species is not as sinister as we seemingly demonstrate.

For me, scientific research and discovery is much needed good news, a human craving for knowledge and expression of creativity that knows no bounds. Science, Scientific American, New Scientist, National Geographic, –they offer stories of teams that are working to solve some of our greatest challenges. Yes, many of the stories begin with a description of a dire situation–global warming, browning waters, fisheries on the brink of collapse, energy production that poisons our atmosphere, and the spread of deadly disease. But each issue is met with deeper insight to the problem and often a means to counter pending catastrophe. Even more stories are about pure discovery, made by those who desire to know how the world works in intimate detail.

We peer inside the human brain to address our behaviour. We follow the migration of wild game to learn how to help keep ecosystems in balance. We study ancient relics to learn what we once knew, but have long since forgotten. We look to the dark corners of our solar system in search of the origin of life and to the very beginning of time to determine if this is the only universe, or one of many which co-exist.

“The hole wide multiverse”
“A 10-minute rest can boost memory like sleep”
“Farting plants kick up a stick if irked”
“Narwhal nurseries spotted”
“Math whizzes of ancient Babylon figured out forerunner of calculus”
“Tegu lizards get body heat boost during mating season”
“Computer that mimics human brain beats professional at game of Go”

In New Scientist, issue Jan 9-15, 2016, a story of Alexander Graham Bell in 1880, when he built a photophone, a device that uses light to transmit sound, has him saying, “I have heard articular speech by sunlight! I have heard a ray of sun laugh and cough and sing! I have been able to hear a shadow and I have even perceived by ear the passage of a cloud across the sun’s disk!” The inventor of the telephone, whose namesake yet lives on, wrote in poetic form the exuberance of his discovery and invention.

When we allow ourselves to see the world through the eyes of a child, we once again take on that child-like form. We celebrate what we learn not because it elevates us as individuals, to gain fame, wealth, or power (for those are the burdens of the adult world) but because it opens our minds to what we do not know, and how much more of the mystery remains for us to unravel.

By |2017-08-05T19:12:23-04:00January 29th, 2016|Critical Thinker, Ramblings of a Researcher|Comments Off on Good News for Bad News Days

When research comes to an end …

I have since my return to the States five weeks ago been preparing my MSc thesis for submission to UCT. 114 pages. 40 citations. 20 images. Three weeks to go … and still so much to do. Nearly every day I engage. 2 hours, 4 hours … 14 hours. It is a process I enjoy far more than I thought possible, for the exacting attention to every detail is wonderfully consuming.

Running, hiking, yoga, bread baking, tending the fire at my aunt’s home in Tucson are what I do between the hours I am writing and editing. As when I was developing Karoo GP, I wake, breathe, and sleep my thesis.

This is the making of a scientist. No fact goes to print without evidence of its origin, either in previously published works, or my own, validated research. No statement is personal. This is not about me, but what was discovered about the arena in which I laboured to better understand.

I was twenty months in South Africa, twenty two months in this program. I attended a dozen workshops and conferences in South Africa, Namibia, and Spain in order to broaden my skills and deepen my knowledge, to learn how to begin to understand machine learning as it can be applied to radio astronomy. Countless thousands of pages of literature reviewed, thousands of lines of code written, and hundreds of hours spent in development, data runs, and analysis.

In the end, it comes down to just two numbers, Precision and Recall, to determine if my work was a success.

That is … incredible!

By |2017-08-05T19:12:30-04:00January 6th, 2016|Ramblings of a Researcher|Comments Off on When research comes to an end …

GP update 2015 10/02

(email to my fellow researchers)

This week has come and gone quickly. I have been to SKA four times, including a week ago Friday. The pace of my work differs now, as my data runs are a minimum of 5-6 hours, but more recently 50+ hours as I push GP to 50 generations of 100-200 Trees against 10,000 lines of features.

I arrive. Log on. Run each accomplished tree against the TEST data. Save the results in my diary. Archive the trees. Mod parameters. Start a new run. Two hours in commute for an hour of work. Sounds like living in San Fran, not Muizenberg.

So far, very good! No glitches in my software. Not a single crash (except when Nadeem accidentally killed Karoo GP 35 gens into a 50 gen run, trying to kill zombies at my request. Silly us! You can’t kill zombies, they’re already dead! I know, old UNIX joke, but it’s still funny :) The multi-core is solid and linear scaling on the 40 core box. The server version (configuration file + single line execution) works well for repeat runs.

I have conducted four full runs, with the fifth now in progress. Keeping a diary of the results, including the Precision / Recall against the TREE ID and it’s polynomial expression. What’s more, every tree is saved in a .csv file at the end of each Generation. Even when Karoo was terminated accidentally, nothing lost.

Now, I need to write a script which loads a .csv and runs with it, as a total population seed (common according to the literature). The continue function is already in place, so just need to slip a loaded list of arrays into population_a and cont.

Consistently, I am seeing 82-86% Precision (in a 50/50 dual class feature set) with Recall just a few points below. I need to look at AUC and one other analysis (rcm by Thuso; can’t recall the name) to get a full understanding of how Karoo is doing.

Ok. Back to work …

By |2017-11-24T23:44:43-04:00October 2nd, 2015|Ramblings of a Researcher|Comments Off on GP update 2015 10/02

GP update 2015 09/25

(email to my fellow researchers)

Today marked the first official day of Karoo GP processing KAT7 data.

My first run was with depth 5 trees against 10,000 lines of data with 5 features. The multi-core functionality saved my recursive ass as the first 10 generations of 100 trees took just over 5 hours to process.

In the end of this minimisation function, there were 3 trees presented as having the best fitness, 2 of which shared the same polynomial expression. I think that is a good sign, but not certain yet.

Precision was 86% for both. Recall quite a bit less.

I sent the first run back into another 20 generations (a new feature I added to Karoo GP this week which allows you to continue the evolution indefinitely), and started another run with the same settings, to see if it converges on anything close to the first set of equations.

Will find out on Monday …


By |2017-11-24T23:44:49-04:00September 25th, 2015|Ramblings of a Researcher|Comments Off on GP update 2015 09/25

GP update 2015 09/13

(email to my fellow researchers)

My classification TEST suite is complete, producing Accuracy, Precision, and Recall scores on associated trees.

I have a very basic evaluation built for the Abs Value (minimization) function. Not really sure what one usually uses to test one of these, other than comparing the distance from the known solution to the produced result.

Spent a few hours dealing with a ‘zoo’ (Pythonic nomenclature for “divide by zero”). Seems SymPy is willing to carry divide-by-zero functions as long as you don’t attempt to process them as a float. Then it freaks out. So, I had to intercept the polynomial processing with a str() test for ‘zoo’.

Anyone ever tried a Google search for “python zoo”?

Finally, I need to apply the sklearn split function across my data. The framework is in place (already modified the way the data is passed through the entire script to accommodate both TRAINING and TEST).

Should be easy. (stupid last words)


By |2017-11-24T23:44:56-04:00September 13th, 2015|Ramblings of a Researcher|Comments Off on GP update 2015 09/13

Normalisation is abnormal

(sitting at AIMS)

From 10 am till 2 pm, this is what I built. Argh! Should not have been so hard.

def fx_normalize(self, array):
norm = []
array_min = np.min(array)
array_max = np.max(array)

for col in range(1, len(array) + 1):
n = float((array[col – 1] – array_min) / (array_max – array_min))
norm = np.append(norm, n)

return norm

… but now, my function appears completely different, the curve of the line gone. Is it just the scale? Yes, further testing confirms this. Good.

Now I will return to my work with Accuracy, Precision, and Recall.

By |2017-11-24T23:45:04-04:00September 10th, 2015|Ramblings of a Researcher|Comments Off on Normalisation is abnormal

Frustration in the easy things

(sitting at SAAO)

Per my work at SKA yesterday, I learned to use matplotlib to produce 3D plots of my functions in combination with a scatterplot of the Iris features. I was hoping to automate the solving for any given variable using Sympy, but have not found a means to that end. For now, I will manually reform each algebraic expression. Not ideal, but I need to move ahead. Spent too much time on this already.

I recognise that plotting is core to any modern research. I feel far behind, but know I will come up to speed quickly. Between the older gnuplot, matplotlib, and sympy’s plot functions, there are myriad approaches (too many, in fact).

As I have many times experienced over the past year, every day I am humbled by the challenge of learning something new, and at the same time rewarded by the same. Each day feels incremental, but when I look back to my very first line of Python a little over a year ago, and now, over 2500 lines of Object Oriented, multi-core code with a home-built Numpy array management system, yeah, I’ve learned a ton.

By |2017-11-24T23:45:11-04:00September 9th, 2015|Ramblings of a Researcher|Comments Off on Frustration in the easy things