Last week, we examined concepts from discrete math - the foundation for the logic behind computers. Graph theory could be considered as its subset where information is mapped as entities (nodes) and relations (edges) connecting them.
Sometimes it is referred to as network science to distinguish itself from the graphs one may come across in statistics (bar graph, etc.). As the name hints, it is all about studying the science in interactions - generally by visually indicating it. For example. the network below indicates the editors who translate pages on Wikipedia from one language to another.
In this post, we will take a quick dive into graph theory and what it has to offer in todays data intensive era.
Key concepts
A graph is a collection of nodes and edges. Both of these components may have properties of its own - such as node details (regarding a particular instance), edge thickness (indicating volume of flow), edge direction (to indicate flow) etc.
What follows next are a series of questions - is there a path in this graph that covers all nodes (euler path)? a path that covers all edges (hamiltonian path)? Maybe one that covers all nodes but without any loops (a tree)? As more questions are posed, more observations are made and properties defined - for example, a regular graph is one that has every node connected to every other node - so all nodes have the same degree (measure of number of connected nodes).
Once these checking type of questions are covered, what interests most is to find a way to get an answer. For example, to find the shortest path from one node to another through the graph (eg: Djikstraโs algorithm), to determine the central most node in a graph (eg: Eigen centrality), to generate a tree (cycle-free graphs) from a graph (eg: Minimum spanning tree) etc.
All sounds good using pen and paper. But how can you model graphs onto your computers accustomed to discrete math? Well, we have an assortment of options to choose from - you can use adjacency matrix to map the connections between nodes. Or you can leverage two lists to map order of node connections. If linked lists (more on it in a post on data structures) is your thing, it can be used to visualize the graph directly (but questionable choice in terms of memory management).
Whichever method you choose, you can use visualization tools like graphviz to see it rendered as you would on paper. And with this digital abstraction, you can greatly scale your problems to be computed on the machine.
Journey thus far
Many attribute the origin of the thought work behind this field to the Konigsberg problems that Euler tackled - to find if there is a path that covers all the land masses once via the 7 bridges to return to the starting point. By converting the land masses into nodes and bridges into edges, he was able to study the graph equivalent of the problem and study its properties. He observed that when the degree of a node is odd, there canโt be a path that satisfies this condition (also known as Euler path).
In Chemistry, there is detailed study carried out involving atoms and compounds interconnecting them. Again, graphs play a vital role in studying different possible configurations of the chemical by enumerating its different structural configurations in space. In the field of topology, a very notable problem that intrigued many was the graph coloring problem - to be able to definitively tell whether regions in a map can be uniquely colored to distinguish itself from its neighbors - using just 4 colors. It may be interesting for you to try this out on paper if youโd like - represent regions as nodes and boundaries as edges.
Network analysis is a critical topic in the field of electrical engineering - to study distribution of electricity through a a circuit with different components and calculating its dissipation at different parts. In a similar spirit, during world war it is said that the forces had to map its resource distribution routes and study how to reroute goods if a path is compromised by the enemy.
When we move towards our recent past, PageRank Algorithm by the founders of Google essentially defined our internet by finding means to determine the more relevant websites in a web. The technique is quite straightforward - it studies the linking ratio of sites to one another and accordingly determines which page is better cited. A popular product of Google - Google Maps - essentially is path routing. It takes in inputs from multiple android devices to study traffic on the road and accordingly recommend paths to reach destination fast.
Graph Database Systems (eg: Neo4j, DataStax, AWS Neptune) are challenging the status quo of relational database systems (involving tables and foreign keys for JOINing - more on it in a future post) by seamlessly translating Entity Relation Diagrams into end structures. Investigative Journalists leverage such tools to study patterns in a pool of messy data piles to derive meaning amidst it all (eg: Panama Papers - a groundbreaking discovery of corruption at scale).
Even in other fields of computer science, networks play a vital role - to study distributed systems, data transfer over digital networks of the internet, resource allocation scope in operating systems, etc. A rising and related field is Complex Systems where we study ever changing properties in dynamic systems and accordingly predict future instances of the same. Examples include bee hive behavior, brain structure, society, internet etc.
This domain interconnects so many related fields that it deserves a post of its own in the future - perhaps sometime in 2022 once most CS concepts are covered.
Present Day work
In terms of growth of the field, Albert-Lรกszlรณ Barabรกsi took several vital steps to push the bar in this field in the 1990โs. In a time where most disregarded the potential of networks, he constantly innovated through his research to predict systemic behaviors.
Around the same time, Steven Strogatz and Watts were working on the small world theory of interconnectivity - in particular the bold claim that every person is connected to every other person within 6 hops (also known as 6 degrees of separation).
Another rather notable researcher in this field is Mark Newman who has extensively studied different aspects of network theory and complex systems - from power law to election tendencies in society.
Today, there are many institutes around the world (like Santa Fe Institute) where active research is being conducted by physicists and mathematicians to derive more properties and determine new problem spaces. With the abundance of data fueling the information era, graphs are super charged to tackle new problem spaces boldly.
How to get involved?
The field is still in its defining phase and needs many more to push the ceiling further. To read more, check out the following resources.
ACMโs article on Graphs
Quanta Magazineโs article on Graphs
Wikipedia page on graphs, networks and complex systems
Graphs4Science Newsletter
NetSci conference and community
To code, check out the networkx library
Network Science Book - by Albert-Lรกszlรณ Barabรกsi
Network Atlas Book - by Michele Coscia
Video introduction to Network Science - by Renaud Lambiotte
Twitter List on Complex Systems
Joelโs blog on graphs
Disclaimer: Complex Systems is more than mere networks - as illustrated below. More on this field in the future. Until the next post, bye!