Back Home

MADlib - It is an open source library for scalable in-database analytics, providing data-parallel implementations of

- mathematical
- statistical
- graph
- machine learning methods for structured and unstructured data.

It uses shared-nothing, distributed, scale-out architectures to offer data scientists an effective tool set for challenging problems involving very large data sets.

MADlib is SQL-based and supports Pivotal Greenplum Database and PostgreSQL.

As metentioned Mad Lib Supports Graphs

MADlib supports directed graphs (digraphs) containing vertices, edges and edge weights:

This data is turned into standard sql data where the graphs are represented by a vertex table and an edge table and we can use standard sql.

For example the classic page rank, a page with the highest number of edges pointing back to it

SELECT madlib.pagerank( vertex_table, -- list of vertices in graph vertex_id, -- col in vertex table containing vertex IDs edge_table, -- list of edges in graph edge_args, -- source, dest, edge weights cols in edge table out_table -- output table with PageRank distribution );