A View from Liaison’s Fall 2018 Data-Inspired Future Scholarship Winner
My preliminary interest in data started by accident more than anything else.
I was a sophomore in college, trying to pick elective classes for the upcoming fall semester. On a whim, I decided to take an introductory-level data science course — so introductory, we never even learned to code, opting instead to do all of our analyses in a graphic user interface called Rattle.
That alone would give most seasoned data scientists pause. No coding? Really?
Looking back, however, that ended up being more of a blessing than a curse. Without having to fumble through a sea of logic and syntax errors, we as students were allowed to focus on what mattered the most, starting out — the data itself.
It didn’t hurt that the professor who taught the course was also outstanding. He was a very soft-spoken man, but the sheer energy that radiated off him when he talked about his work was nothing short of infectious. Listening to the practical projects he’d been able to do as a data scientist, the meaningful results he’d been able to obtain from the data, and, perhaps most importantly, the difference he’d been able to make because of his data-driven insights struck me as nothing short of fascinating. By the time the class came to an end, I was hooked. I knew that in one way or another, data was going to be a major part of my future.
Fast forward to present day. I have been lucky enough to have had several professional data science roles under my belt, I know how to code (yay!), and I know much more about the industry than I did before. Terms like ‘data science’ and ‘machine learning’ and ‘artificial intelligence’ get thrown around in excitement by industry and non-industry folks alike, but the reality is nowhere near as glamorous as it sounds.
No doubt, there’s lots of promise to be found in all of these advanced applications, and the possible benefits are astounding. But there are times where there is a sharp disconnect between what is expected versus what can actually be done.
In light of this disconnect, I believe that, perhaps now more than ever before, there needs to be a shift to focus on what’s really at the center of it all. Not fancy algorithms. Not cool visualization tools. Not increasingly-complicated applications of deep learning.
I think the focus needs to go back to data — just data.
Why? Well, if there’s one thing that I learned from my very first class, and then saw repeated, over and over, during my experiences with industry, it’s this: Your models are as good as the datasets used to build them, plain and simple.
Without high quality data, even the most sophisticated applications fall apart.
At the end of the day, the goal of any data-related initiatives should be producing actionable insights and benefits. As a result, the challenges facing companies and individuals who want to take their data-driven decision-making capabilities to the next level are immense.
One of the most obvious challenges that comes to mind is data integration. Many companies face challenges with combining structured and unstructured data from multiple sources into something meaningful. This is especially true of older companies, which may find it a challenge to integrate legacy systems with the newer, shinier applications being used. But tackling this integration challenge head-on is a task that’s more than worth doing; the more company data is integrated, the easier it is to draw from said data to create the necessary models, systems, and visualizations required to drive meaningful changes.
Data security is another challenge. Equifax and the infamous data breach immediately comes to mind, but the compromise of sensitive data can be far more insidious and just as damaging. Keeping company and client data secure from external perpetrators, but also knowing that it hasn’t been internally modified or tampered with, are both issues that should be of heavy priority to anyone in the field.
Addressing the problems associated with human bias in training data is another issue that’s close to my heart. The premise here is simple — data that contains human bias in some way runs the risk of then perpetuating that bias when used in machine learning and artificial intelligence applications.
For example, let’s say a predictive model is fed historical employee salary data, with the goal being to automate pay raises for all employees for the next year. However, the data contains data points where men are paid significantly more than women. As a result, the new model built perpetuates that pay gap in its new predictions for pay raises — in line with what it saw, based on the data it was given, but out of line with what the industry is trying to correct for today.
That’s a simplistic view of things, of course. Other factors come into play as well when looking at these kinds of data points. The takeaway here is just that if bias isn’t corrected for, it can simply be furthered.
Saying the subject is tricky and complicated is an understatement, and truthfully, there is no one right answer. But asking the question in the first place, and at least attempting to solve it, is key to the industry moving forward.
Finally, I think the next generation of data professionals — data scientists, engineers, and analysts alike — should have a say in the type of data being collected by their companies. There are so many possibilities for new problems to be solved, new questions to be answered, but none of that can be done if the data needed simply isn’t there in the first place. Data professionals can and should be at the forefront of these decisions — collaborating with folks on the business sides of their organizations to figure out what information needs to be collected, who needs to collect it, and what can be done with it, moving forward.
But none of those initiatives can get off the ground if good data — high quality, well-managed, fully-integrated data — can’t be secured in the first place.
In that sense, it feels oh-so-timely that I was introduced to Liaison Technologies by way of the Data-Inspired Future Scholarship program. Liaison offers products and solutions that address many of the issues I’ve mentioned — data management, integration, and security, for starters. But even more importantly, it’s clear to me that the central focus on data itself is well understood.
Data, used right, provides the opportunity for people to gain insights into their businesses and their employees in a way that human intuition may not necessarily reveal right at the outset. The right data software solutions, then, empower organizations to stress less about the quality of the data on hand and pay more attention to their internal objectives.
In a nutshell — focus on the data, and from there, the results and insights will follow.
I, for one, am incredibly excited to see where this data-inspired future leads.
Blog Editor’s Note:
Applications are now being accepted for the Spring 2019 Liaison Data-Inspired Future Scholarship. Applications are due October 31, 2018. The recipient will be announced at the end of November 2018. Learn more and submit an application here.