Azure Cloud Services - Neo4j
Yes another exploration and exercise on graph modeling. This time about something I often use… the Cloud: Microsoft Azure. Thought it would be interesting to model the available Azure Cloud Service offerings by category, region, and location. The inspiration and data comes from the following source of information on the following web page http://azure.microsoft.com/en-us/regions/#services. A trivial one, but useful for demonstrating and understanding concepts. This is also a great example of how to store master data and discover new insights in existing data.
The Cypher, CSV files, and database backup can be found here -> https://github.com/sfrechette/azureservices-neo4j
You can also get a copy of the raw data in Excel here on OneDrive
How to approach and model our graph
Let’s visualize it! A Region ‘East US’ is in a Location ‘Virginia’, that Region offers a Service ‘Stream Analytics’ and that Service belongs in a Category named ‘Analytics’. Also a Service could be a parent or child of another Service. From that analysis we can create the following model.
Azure Cloud Services Graph Model:
Graph model built using Arrows http://www.apcjones.com/arrows/
Importing the Data using Cypher
Using the CSV files will be using Cypher’s LOAD CSV command to transform and load the content into a graph structure.
You can import and run the entire Cypher script using the neo4j-shell by issuing the following command:
bin/neo4j-shell -path azureservices.db -file cypher/import.cql
*make sure the Neo4j service is not running before you execute the script (bin/neo4j stop)
The following is the Cypher script that creates the indexes, constraints, nodes and relationships for our graph.
// Create indexes for faster lookup
CREATE INDEX ON :Service(serviceName);
CREATE INDEX ON :Category(categoryName);
CREATE INDEX ON :Region(regionName);
CREATE INDEX ON :Location(locationName);
CREATE INDEX ON :Location(countryName);
CREATE INDEX ON :Location(continentName);
// Create constraints
CREATE CONSTRAINT ON (s:Service) ASSERT s.serviceId IS UNIQUE;
CREATE CONSTRAINT ON (sc:Category) ASSERT sc.categoryId IS UNIQUE;
CREATE CONSTRAINT ON (r:Region) ASSERT r.regionId IS UNIQUE;
CREATE CONSTRAINT ON (l:Location) ASSERT l.locationId IS UNIQUE;
// Create services
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/services.csv" as row
CREATE (:Service {serviceName: row.Service, serviceId: row.ID});
// Create service categories
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/categories.csv" as row
CREATE (:Category {categoryName: row.Category, categoryId: row.ID});
// Create regions
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/regions.csv" as row
CREATE (:Region {regionName: row.Region, regionId: row.ID});
// Create locations
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/locations.csv" as row
CREATE (:Location {locationName: row.Location, locationId: row.ID, countryName: row.Country, continentName: row.Continent});
// Create relationships: Service to Category
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/services.csv" AS row
MATCH (service:Service {serviceId: row.ID})
MATCH (category:Category {categoryId: row.CategoryID})
MERGE (service)-[:CATEGORY]->(category);
// Create relationship for service to service (parent-child)
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/services.csv" AS row
MATCH (child:Service {serviceId: row.ID})
MATCH (parent:Service {serviceId: row.ParentID})
MERGE (parent)-[:PARENT_OF]->(child);
// Create relationships: Region to Service
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/serviceregions.csv" AS row
MATCH (service:Service {serviceId: row.ServiceID})
MATCH (region:Region {regionId: row.RegionID})
MERGE (region)-[:OFFERS]->(service);
// Create relationships: Region to Location
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/regions.csv" AS row
MATCH (region:Region {regionId: row.ID})
MATCH (location:Location {locationId: row.LocationID})
MERGE (region)-[:LOCATED_IN]->(location);
Once the script successfully executed, start Neo4j and load the web console (http://localhost:7474)
Some cypher queries
// Service and categories offered in Region - North Europe
MATCH (r:Region {regionName:'North Europe'})-[o:OFFERS]->(s:Service)-->() RETURN r,o,c;
// List Services with Category available by Regions
MATCH (c:Category)<--(s:Service)<--(r:Region)
RETURN s.serviceName as Service, c.categoryName as Category, collect(distinct r.regionName) as Regions
ORDER BY Category, Service
// Regions that offer the service Machine Learning?
MATCH (region)-[:OFFERS]->(Service {serviceName:'Machine Learning'})
RETURN region.regionName as Region
New to Neo4j and graph databases, follow this link to get you started http://neo4j.com/developer/get-started/