Back Home

## Apache MADlib: Big Data Machine Learning in SQL

MADlib - It is an open source library for scalable in-database analytics, providing data-parallel implementations of

mathematical

statistical

graph

machine learning methods

for structured and unstructured data.

It uses shared-nothing, distributed, scale-out architectures to offer data scientists an effective tool set for challenging problems involving very large data sets.

MADlib is SQL-based and supports Pivotal Greenplum Database and PostgreSQL.

As metentioned Mad Lib Supports Graphs

## MadLib for Graphs

MADlib supports directed graphs (digraphs) containing vertices, edges and edge weights:

This data is turned into standard sql data where the graphs are represented by a vertex table and an edge table and we can use standard sql.

For example the classic page rank, a page with the highest number of edges pointing back to it

SELECT madlib.pagerank(
vertex_table, -- list of vertices in graph
vertex_id, -- col in vertex table containing vertex IDs
edge_table, -- list of edges in graph
edge_args, -- source, dest, edge weights cols in edge table
out_table -- output table with PageRank distribution
);