Understanding the science of data
"We want to help our students become more capable and well-rounded data scientists, to be able to coordinate and communicate with a team to help transform the data into something useful for the organization."
A Q & A with Assistant Professor of Sociology and Program Director for the M.S. degree in Data Science, Dr. Michael McCarthy, as he discusses the role of data sciences and the ways data science will become more integral to the strategy of organizations.
Q: Tell us a little about yourself.
A: I grew up in California, and graduated from the United States Military Academy at West Point in 2000. Following graduation, I went to flight school. I was stationed overseas in Germany and then deployed to Iraq, where I led an aviation platoon. One of the last things I did in the Army before getting out in 2007 was work for an internal Army think tank that studied the future. We looked a lot at demographic, economic, and agricultural trends. We tried to postulate what the future might hold. Working there got me excited about going back into academia, as I was not always a great undergraduate student. I earned my master’s degree; I absolutely loved it and went on to get my Ph.D. in geography. I then worked as an analyst at the U.S. Veterans Administration for two years before coming to Utica College.
Q: What is data science, and what do data scientists do?
A: Data doesn’t speak for itself, and can be extremely biased, thus we strive to find meaning and knowledge in the data that is all around us. Often, data science is seen as a three-part Venn diagram. The three parts are mathematics, computer science, and the “other” domain. This “other” is what makes data science hard to pin down, as there can be biologists, business analysts, sociologists, and many other disiplines have data scientists. That third portion can include almost any field, including philosophy, geography, and technology. At UC, we try to help our students lean toward the center of that Venn diagram, and give them the computer science skills, math skills, and “other” skills, including business, financial crime, cybersecurity, and social sciences. There are so many components that one person cannot be an expert in it all. Despite that, it is important that we help students develop their strengths, while also growing stronger in the more difficult areas. We want to help our students become more capable and well-rounded data scientists, to be able to coordinate and communicate with a team to help transform the data into something useful for the organization.
We tell our students, “put the scientist hat on first,” meaning we use the scientific method first, enabling us to use the data to present sound findings. Our program, more than other data science programs, focuses on ethics, bias, and social responsibility. There is a perception that data is unbiased, but this is not the case. Knowingly or unknowingly, the data, the questions we ask, and our analysis can be biased. We try to teach students to assess bias at each step and at the end we ask “Is this ethically beneficial, ethically analyzed, and what is the social benefit of this analysis?” By enabling the students to ask that question, and hopefully answer it, helps them develop the methodology and mindset that is part of the consideration at each step of the process.
Q: What can a graduate of the master’s in data science program do with their degree?
A: The sky’s the limit. We started this program in 2017, and we just recently had our first cohort graduate, we have a cohort that graduates every term from now on. The stories that I hear from my students about the transformations they’ve made are heartwarming and phenomenal. I have a student who transformed himself from a barista to a data analyst for a major online platform while still in the program. One of the most exciting moments for me was at the first graduation when a student handed me her business card with the title “Data Scientist.” The title of data scientist is actually rarer than people think. Often times, the title is analyst, so to have students actually receive the full title is quite amazing.
Q: How does data science fit in with the social sciences?
A: In most cases, a lot of the information online comes from searches, social media, and most of the data we are concerned about relates to people. While not necessarily always true, I introduce the idea that “data are people.” As we work through the data, this idea connects the ethics and social responsibility components of data science. When I was working in healthcare, it was easy to get bogged down with the numbers, accounts, timestamps, et cetera. However, it was important to be mindful that what I was looking at was actually the people we were taking care of, the patients. As we take a social science outlook on data, it enables us to have a different lens beyond the usual business lens, and helps us strive to understand the actual inputs rather than just the data itself. We can ask the question, “What does this actually mean for the people whose data we are collecting?” It helps us have the perspective of the person, not just looking at the rows and columns of the data that comes from what the person does. It gives us a nice theoretical foundation to build our analysis.
Q: A common idea people hear in regard to data science is “manipulating data.” What does this mean?
A: I’m a classically-trained geographer, and a text that a lot of people talk about is a famous book on how to lie with maps and a similar book in math about how to lie with statistics. It is easier than expected to manipulate data to have it tell the story that you want it to tell. That is why it’s important that we assess our ethics, biases and social responsibilities at each step. We have a small case study that looks at the U.S. News & World Report college rankings versus Forbes college rankings in which the top colleges are similar, but they’re not the same. The differences become greater as one goes down the list, which is due to the weights assigned to what makes a college the “best.” It’s very important that we try to avoid confirmation bias, the act of seeing the data in a way that fits the narrative we hoped for. In the introductory class, we talk about how, as data scientists, we have to be able to talk to leadership and tell them that what they were hoping for may not be the case and stating what the data is really showing as opposed to what they wanted it to show.
Q: What does the future look like for data science? How will data science evolve over the next decade?
A: I believe that data science is going to become more integral in helping organizations find the strategies and important aspects that aren’t currently being analyzed. As a technical component, everything is shifting to distributed, cloud-based analysis, which will become much more ubiquitous, even to small companies. Data scientists are going to become much more than just analysts. They’re going to be almost evangelical or prophets for ethics, bias, and social responsibility and for the transformation to data-driven organizations.
My biggest hope is that with all the management or mismanagement of people’s data that’s currently happening in the U.S. that data science as a discipline does not become distrusted by the general population. As doctors have the Hippocratic oath in medicine, there is talk of a similar idea for data scientists. While I don’t believe that we necessarily need an oath, we want to strive to have similar ethics. Medicine has a strong reputation for doing what is best for the patient, and I’d like to think that we could have similar respect and trust that data scientists are doing what is best for people. Honestly, I’m amazed that thus far people don’t have mistrust in data scientists.
If I told people 15 years ago how much data they were sharing with companies willingly and openly, they might be appalled. We’ve gone done a road now though where everything’s “free.” “Free” Gmail, Facebook, and social media sites are not really free; we’re paying for them by offering our data as well as by seeing ads. There’s a famous quote by Jeff Hammerbacher, “The best minds of my generation are thinking about how to make people click ads” which is a sad reality. Recently, people are fine with the data they share because they’re getting something for free. We live in an interesting time where there is a generation that is fine with sharing data and a generation that isn’t affected by it because they are not fully engaged with the online platforms.
Q: Are there regulations to deal with the issue of data manipulation?
A: The United States is actually further behind than one might think. American internet uses benefited a bit from what Europe enabled through the general data protection regulation (GDPR). There are some very important fundamental components to the European GDPR that would be perfect for everyone who uses the Internet, but especially the United States. While there are a lot of GDPR components that are being implemented broadly in the U.S. by multinational companies and platforms that strive to adhere to the GDPR requirements, universal protections are needed. While people would generally say that every component of the GDPR is important, it’s not something that many Americans are strongly advocating for.
I would like to see logins and resources for:
For a general list of frequently used logins, you can also visit our logins page.