Utilizing Open Source to Mediate Implicit Bias in AI

Dylan Maloy - Tufts University

A short history of AI

Since Alan Turing asked the question “Can Machines Think” in the 1950s, artificial intelligence has been an extremely promising technology. Expecting a market growth rate of over 400% in the next five years, these impressive metrics certainly reflect the hope that it will enhance the human experience as we know it.


Unfortunately, with any technology of this nature, there can be ethical concerns that arise. Algorithms developed by flawed humans have the potential to amplify said flaws. We are then asking these algorithms to make moral decisions for us. AI skeptics may believe that this may result in catastrophic outcomes.


Consider the conversation surrounding autonomous vehicles. Yes, it is almost a given that they will reduce many of the dangers currently caused by human drivers. This leads us to a discussion of the trolley problem; should the vehicle sacrifice a fewer number of lives to save a larger group? The answer has yet to be determined.


Within this sea of unknowns lies accountability. Since AI technologies are still in their infancy, industry regulators and administrators have struggled to hold firms accountable for their misuse. To minimize the repercussions of this malpractice, companies can participate in ethics washing in order to improve their public image - often overstating their interest in the pursuit of ethical and equitable AI systems. A recent example of ethics washing in this space occurred in 2019 when Google disbanded their AI ethics council only a mere 10 days after it was formed. The counsel itself looked as if it had little power to govern any ethical aspect of the company - further displaying itself as a facade to convince the public that they care deeply about the ethical effects of these technologies. The council quickly amassed an opposition which led to its speedy demise. Since then, much hasn't changed. In the past year, they have fired multiple AI scientists whose work questioned the ethics behind Google's AI research.


As a result of poor accountability, technologies that were destined to aid humans can be used with minimal ethical oversight. Prime examples of this can be found in the sub-technology of facial recognition.

Biases in Facial Recognition

Facial recognition is considered to be one of the more controversial topics in AI - especially when it comes to policing and law enforcement. These controversies often stem from the implicit biases that make their way into these algorithms.


These biases can often be classified into two categories:

  • Ones that originate from a poorly modeled dataset

  • Ones that originate from variables within the environment in which an algorithm is developed


The Gender Shades project, a study that sheds light on these biases in various commercial image classification algorithms, shows that poor performance in such controversial settings can result in serious ethical and social consequences. The algorithms studied performed worse on underrepresented demographics, however, all but one failed to mention the confidence ratings behind these performance metrics.

On the bright side, modern AI research has begun to tackle the problem of dataset biases by coming up with new ways to algorithmically remodel flawed datasets such that the data used to train these facial classifiers is comprehensive and does not neglect statistically under-represented demographics. Teams at MIT and Princeton have developed algorithms of this nature, successfully increasing the overall accuracy of image classifiers by resampling the training data to represent a broader demographic.

Bias as a Social Issue

So, we know technical biases in AI can be mitigated by other AI technology! This is great news, however, researchers and regulators alike have found it hard to do the same for environmental biases such as those defined above. The industry is in need of a technology or system that can be applied to existing algorithm workflows to provide transparency to consumers and regulators while promoting decentralization and diversity in the workplace.

What is Open Source?

Open source, an initiative that embodies both of these traits, has become immensely popular over the past two decades.


At its core, open source refers to something publicly accessible that can be modified and shared. Today, there is a massive communal aspect that allows for expedited collaboration, transparency, and safety among open source software. It truly is the epitome of safety in numbers.


An example of this success can be found in one of the most well-known open source projects to date - the Linux Kernel. Created by Linus Torvalds, the codebase currently lives on GitHub where collaborators and maintainers alike have the ability to edit and share the source code. Today, the repository has over a million commits and over 10 thousand contributors.

Another great thing about open source is the ability to use existing projects to build new things. This process, often called “forking,” allows anyone to make a copy of an existing repository and experiment on it without affecting the main one. Such a process invites new ideas and projects to stem from another.


Sticking with our Linux example, there have been many distributions (operating systems) built on the kernel that have surfaced over the years. A prime example of such is Ubuntu. This distribution, among others, has become extremely popular in the server and cloud computing market, populating over 95% of the world's top 1 million servers. Without a doubt, this piece of software and all of its contributors have had a beneficial impact on the internet as we know it.

Big Tech and Open Source Software

Unfortunately, there are a few deterrents that may make the for-profit sector wary of implementing open source into their workflow.


For instance, open source allows for competitors to view and copy the contents of one's work. This can be considered a positive because competition will expedite technological advancement, however, companies will be less likely to invest money into something if it does not give them a leg up in the industry.

Others believe that open source could put sensitive AI algorithms into the hands of the wrong people. Although this possibility exists, the ideas behind these algorithms have most likely already been published in academia. And, if not, is this something that should be kept a secret? Although a question that has not yet been answered, it is important that we recognize and understand the tradeoffs which will need to be made if open source becomes the new standard for such technologies.


On the other hand, proponents of open source have shown that it facilitates talent acquisition through public maintenance, expedites the development process, and reduces the need for in-house maintenance.

Conclusion

Although AI is still considered to be somewhat of a black box, studies have shown that human flaws can have a negative effect on how these algorithms perform. Humans are not perfect creatures after all, and these imperfections are often amplified through poor development practices. As referenced above, there are ways to alleviate the effects that human biases have on these systems. Having these algorithms in the hands of a large and diverse group of maintainers who are passionate about their contributions is imperative to the safety, dependability, and equitability of their results.


Thus, to prevent misuse, we must regulate these technologies using systems that have been proven to afford consumers and regulators complete transparency while promoting equitable development practices.


Through the use of open source, these goals can be achieved. The stakes have never been higher; hopefully, both industry administrators and firms will see the benefits of open source software on both sides of the equation. Its use, paired with the usage of de-biasing algorithms, will help create a robust ethical and technical foundation for the promising future of artificial intelligence.