Social Network Analysis (Introduction & Tutorial)

What is a Social Network Analysis? You’ve probably seen those colorful network graphs in newspaper articles or scientific papers. They look like a lot of work to create, right? Or maybe not?

Actually, you can conduct such an analysis without extensive programming knowledge or expensive software.

If you want to know how to do it – then you should sharpen your pencil and take notes.

In this article, I will explain everything about Social Network Analysis – where it comes from, what it’s good for, and how you can apply it. I will cover these five areas:

Network Theory
Applications of Social Network Analysis
Data Collection
Data Analysis and Visualization
Overview of the Best Software Tools

By the end of this article, you’ll have all the links and further information you need to conduct your first Social Network Analysis.

#1 Network Theory

To understand Social Network Analysis, we first need to be aware of its theoretical basis: Network theory. This theory comes from the mathematical graph theory.

Network theory deals with the relationships between specific objects. In the context of Social Network Analysis, these objects are usually social actors. These relationships and objects are represented using a graph, meaning a diagram that connects (i.e., with lines) two or more objects (i.e., points).

Nodes and Edges

In the vocabulary of Social Network Analysis, an object is called a node (or vertex). The relationship between two or more nodes is represented by edges. These are the lines between the nodes.

A relationship can be either undirected or directed. Let’s imagine our network represents the relationships between Instagram accounts of famous politicians. The nodes are the people, and the edges are the follower relationships.

If Kamala Harris follows Donald Trump, but he does not follow her back, there is a directed edge from Kamala Harris to Donald Trump, usually shown with an arrow. Kamala Harris is the starting node and Donald Trump is the ending node.

If Donald Trump also follows Joe Biden, but not Kamala Harris, Donald Trump is an adjacent node to both Kamala Harris and Joe Biden. However, Joe Biden is not an adjacent node to Kamala Harris.

Centrality Measures

When facing a larger network, you might want to know certain properties of individual nodes or determine which nodes are particularly important or play a specific role in the network.

For this, you can calculate various centrality measures.

Density

The density measure helps you to describe a characteristic of the entire network. It indicates how many edges there are in the network relative to the maximum possible number of edges.

For example, it shows how many users in our group of politicians are connected with each other compared to a scenario where everyone is connected with everyone. If all nodes are connected, the density is 1 or 100%. So, you always get a value between 0 and 1 for density.

Degree Centrality

Now let’s look at centrality measures. They do not describe properties of the whole network but single nodes.

This measure indicates how many edges a node has. If Kamala Harris has 9 follower-relationships (regardless in which direction), the degree of her node is 9.

For directed graphs, we distinguish between incoming edges (in-degree) and outgoing edges (out-degree).

Closeness Centrality

This measure indicates the average length of the shortest path between a node and all other nodes. It shows how central a node is within the entire network.

For example, how many contacts must Kamala Harris go through on average to reach certain politicians? The fewer, the more central she is in the network.

Betweenness Centrality

This measure indicates how often a node lies on the shortest path between two other nodes. Nodes with high betweenness centrality often lie between two or more clusters of nodes, essentially forming a bridge between them.

Eigenvector Centrality

This measure indicates how important the neighbors of a node are. The more important neighbors, the higher the value.

The best example of this measure is Google’s PageRank algorithm. It follows the rule that a web page is ranked higher in search results the more other important pages link to it.

So, if I have a blog post on my website and it is linked by major sites like CNN, BBC, and the Forbes, it’s better than if it is linked by two local newspapers and an unknown blogger.

#2 Applications of Social Network Analysis

Social Network Analysis has two main applications. The first is in academic research.

Social Network Analysis in Research

Theoretically, every discipline within the social sciences can use Social Network Analysis. But it goes beyond that. For example, you can also analyze and visualize citation relationships between papers, universities, and scientists.

Citation network from Stieglitz et al. (2018)

Most commonly, you’ll find Social Network Analyses in political science, communication studies, and sociology.

Social Network Analysis in Journalism

Data journalism often uses the method of Social Network Analysis. Here’s an example from The New York Times about Romantic Insights one can draw from Facebook Maps.

#3 Data Collection

The basis for conducting any Social Network Analysis is data. In most cases, this data is obtained through web scraping or an API interface (for example of a Social Media platform) when dealing with online research.

If you want to practice, there are plenty of datasets available online for free. You can try Google’s search for datasets, Kaggle, or data.gov.

Data doesn’t always have to be collected automatically. It’s also possible to create small networks by manually entering your data into an Excel sheet or digitizing it in some other way.

For a Social Network Analysis, it is important that the data points reference each other, for example, using an ID for each node that is referenced in any other node that has a connection to that node.

Only then can you calculate centrality measures and visualize a network with software.

#4 Data Analysis and Visualization

Now we come to the analysis. The two most common tools for conducting a Social Network Analysis are R and Gephi.

Both programs can be downloaded and used for free. With R, you’ll need some time to get used to it, as you’ll need to learn or look up the programming language commands.

If you want to avoid programming languages entirely, I’d recommend Gephi. This software has a complete graphical user interface, and you can perform all sorts of tasks related to Social Network Analysis.

It still requires some time to learn Gephi, but there are great tutorials available on YouTube or you can get help in Gephi support groups on Facebook.

A Social Network Analysis with very large datasets requires quite a bit of computing power. To prevent your PC or laptop from reaching its limits and Gephi from crashing, you should filter your data beforehand or use a virtual machine.

The next steps to start your first Social Network Analysis would be:

Read the foundational book on Social Network Analysis by Wasserman & Faust (1994)
Get a free dataset to practice
Watch YouTube tutorials on R or Gephi until you’re an expert
Join Facebook groups where you can ask questions
Learn by doing
And don’t forget: Have fun! 🙂