Back Home

Apache MADlib: Big Data Machine Learning in SQL

MADlib - It is an open source library for scalable in-database analytics, providing data-parallel implementations of

  • mathematical
  • statistical
  • graph
  • machine learning methods

for structured and unstructured data.

It uses shared-nothing, distributed, scale-out architectures to offer data scientists an effective tool set for challenging problems involving very large data sets.

MADlib is SQL-based and supports Pivotal Greenplum Database and PostgreSQL.

As metentioned Mad Lib Supports Graphs

MadLib for Graphs

MADlib supports directed graphs (digraphs) containing vertices, edges and edge weights:

This data is turned into standard sql data where the graphs are represented by a vertex table and an edge table and we can use standard sql.

For example the classic page rank, a page with the highest number of edges pointing back to it

SELECT madlib.pagerank(
vertex_table,    -- list of vertices in graph
vertex_id,        -- col in vertex table containing vertex IDs
edge_table,    -- list of edges in graph
edge_args,      -- source, dest, edge weights cols in edge table
out_table         -- output table with PageRank distribution
postgres_graph_and_madlib.txt · Last modified: 2019/03/18 08:46 by root
RSS - 200 © CrosswireDigitialMedia Ltd