二、JanusGraph && GQL

1、JanusGraph簡介

1。1 JanusGraph簡介

JanusGraph是一個可擴充套件的圖資料庫，可以把包含數千億個頂點和邊的圖儲存在多機叢集上。它支援事務，支援數千使用者實時、併發訪問儲存在其中的圖。（JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster。 JanusGraph is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time。）

我們可以將圖資料庫系統的應用領域劃分成以下兩部分：

用於聯機事務圖的持久化技術（

通常直接實時地從應用程式中訪問

）。這類技術被稱為圖資料庫，它們和“通常的”關係型資料庫世界中的聯機事務處理（Online Transactional Processing，OLTP）資料庫是一樣的。

·用於離線圖分析的技術（

通常都是按照一系列步驟執行

）。這類技術被稱為圖計算引擎。它們可以和其他大資料分析技術看做一類，如資料探勘和聯機分析處理（Online Analytical Processing，OLAP）。

1。2 JanusGraph的發展歷史

JanusGraph是2016年12月27日從Titan fork出來的一個分支，之後TiTan的開發團隊在2017年陸續發了0。1。0rc1、0。1。0rc2、0。1。1、0。2。0等四個版本，最新的版本是2017年10月12日。

titan是從2012年開始開發，到2016年停止維護的一個分散式圖資料庫。最初在2012年啟動titan專案的公司是Aurelius，2015年此公司被 DataStax（DataStax是開發apache Cassandra 的公司）收購，DataStax公司吸收了TiTan的圖儲存能力，形成了自己的商業產品DataStax Enterprise Graph。

TiTan開發者們希望把TitTan放到Apache Software Foundation下，不過，DataStax不願意這樣做（可能考慮到要保護自己的商業產品DataStax Enterprise Graph的技術優勢吧，其實這點優勢是從Titan來的），而且自從2015年9月DataStax收購了Titan的母公司後，TiTan一直處於停滯狀態（應該是DataStax收購之後，忙於推出自己的商業產品DataStax Enterprise Graph，忙於整合Titan進自己的商業產品吧，可是Titan本身沒有得到發展）。鑑於此，2016年6月，TiTan的開發者們fork了一個TiTan的分支（因為Titan已經屬於DataStax了，所以他們必須另外弄一個商標），重新命名為JanusGraph，並將其置於Linux Software Foundation下。

2017年4月6日釋出了第一個版本0。1。0-rc1，目前最新版本是2017年10月12日釋出的0。2。0版。

JanusGraph專案啟動的初衷是“透過為其增加新功能、改善效能和擴充套件性、增加後端儲存系統來增強分散式圖系統的功能，從而振興分散式圖系統的開發”，JanusGraph從Apahce TinkerPop中吸收了對屬性圖模型（Property Graph Model）的支援和對屬性圖模型進行遍歷的Gremlin遍歷語言。（“reinvigorate development of the distributed graph system to add new functionality， improve performance and scalability， and maintain a variety of storage backends，JanusGraph incorporates support for the property graph model with the open source graph computing framework Apache TinkerPop and its Gremlin graph traversal language”。）

1。3 JanusGraph的功能（Benefits）

JanusGraph最大的一個好處就是：可以擴充套件圖資料的處理，能支援實時圖遍歷和分析查詢（Scaling graph data processing for real time traversals and analytical queries is JanusGraph’s foundational benefit。）。

因為JanusGraph是分散式的，可以自由的擴充套件叢集節點的，因此，它可以利用很大的叢集，也就可以儲存很大的包含數千億個節點和邊的圖。由於它又支援實時、數千使用者併發遍歷圖和分析查詢圖的功能。所以這兩個特點是它顯著的優勢。

它支援以下功能：

（1）分散式部署，因此支援叢集。

（2）可以儲存大圖，比如包含數千億Vertices和edges的圖。

（3）支援數千使用者實時、併發訪問。

（4）叢集節點可以線性擴充套件，以支援更大的圖和更多的併發訪問使用者。（Elastic and linear scalability for a growing data and user base）

（5）資料分散式儲存，並且每一份資料都有多個副本，因此，有更好的計算效能和容錯性。（Data distribution and replication for performance and fault tolerance）

（6）支援在多個數據中心做高可用，支援熱備份。（Elastic and linear scalability for a growing data and user base）

（7）支援各種後端儲存系統，目前標準支援以下四種，當然也可以增加第三方的儲存系統：

·Apache Cassandra®

·Apache HBase®

·Google Cloud Bigtable

·Oracle BerkeleyDB

（8）透過整合大資料平臺，比如Apache Spark、Apache Giraph、Apache Hadoop等，支援全域性圖資料分析、報表、ETL

（9）支援geo（Gene Expression Omnibus，基因資料分析）、numeric range（這個的含義不清楚）

（10）整合ElasticSearch、Apache Solr、Apache Lucene等系統後，可以支援全文搜尋。

（11）原生整合Apache TinkerPop圖技術棧，包括Gremlin graph query language、Gremlin graph server、Gremin applications。

（12）開源，基於Apache 2 Licence。

（13）透過使用以下系統可以視覺化儲存在JanusGraph中的圖資料：

·Cytoscape

·Gephi plugin for Apache TinkerPop

·Graphexp

·KeyLines by Cambridge Intelligence

·Linkurious

1。4。 JanusGraph的體系結構（architecture，架構）

想要深入瞭解JanusGraph，必須瞭解Tinkerpop。Tinkerpop是Apache基金會下的一個開源的圖資料庫與圖計算框架（OLTP與OLAP），JanusGraph與Tinkerpop的關係可以認為是JanusGraph是基於Tinkerpop這個框架來開發的。

Tinkerpop有個元件叫Gremlin，它是一門用於圖操作和圖遍歷的語言（也稱查詢語言）。Gremlin Console 和Gremlin Server分別提供了控制檯和遠端執行Gremlin查詢語言的方式。Gremlin Server在JanusGraph中被成為JanusGraph Server。

Tinkerpop這個圖資料庫與圖計算框架被很多廠商採用，比如百度開源的HugeGraph，華為的圖引擎服務GES等。

JanusGraph是模組化的體系結構（JanusGraph has a modular architecture）。

它使用hadoop來做圖的分析和圖的批處理，使用模組化介面來做資料持久化、索引和客戶端訪問。

在JanusGraph和磁碟之間有多個後端儲存系統和多個索引系統。（Between JanusGraph and the disks sits one or more storage and indexing adapters。）

它支援的外部儲存系統，目前標準支援的有（當然也可以將第三方的儲存系統作為JanusGraph的後端儲存系統）：

·Apache Cassandra

·Apache HBase

·Oracle Berkeley DB Java Edition

·Google Cloud BigTable

支援的外部索引系統：

·Elasticsearch

·Apache Solr

·Apache Lucene

體系結構圖：

1。5 應用使用JanusGraph的方法

作為一個數據庫系統，它是要用來為應用程式儲存資料用的，那麼應用程式應該如何使用JanusGraph來為自己儲存資料呢？

一般來說，應用程式可以透過兩種不同的方式來使用JanusGraph：

（1）第一種方式：可以把JanusGraph嵌入到應用程式中去，JanusGraph和應用程式處在同一個JVM中。應用程式中的客戶程式碼（相對JanusGraph來說是客戶）直接呼叫Gremlin去查詢JanusGraph中儲存的圖，這種情況下外部儲存系統可以是本地的，也可以處在遠端。

（2）第二種方式：應用程式和Janus Graph處在兩個不同JVM中，應用透過給JanusGraph提交Gremlin查詢給GremlinServer，來使用JanusGraph，因為JanusGraph原生是支援Gremlin Server的。（Gremlin Server是Apache Tinkerpop中的一個元件）。

1。6 JanusGraph的配置檔案

JanusGraph叢集包含一個、或者多個JanusGraph例項。每次啟動一個JanusGraph例項的時候，都必須指定JanusGraph的配置。在配置中，可以指定JanusGraph要用的元件，可以控制JanusGraph執行的各個方面，還可以指定一些JanusGraph叢集的調優選項。

最小的JanusGraph配置只需要指定一下JanusGraph的後端儲存系統，也就是它的持久化引擎。

如果要JanusGraph支援高階的圖查詢，就需要為JanusGraph指定一個索引後端。

若果要提升JanusGraph的查詢效能，就必須為JanusGraph指定快取，指定效能調優的選項。

以上提到的後端儲存系統、索引後端、快取、調優選項等都可以在JanusGraph的配置檔案中進行指定。預設情況下它的配置檔案存放在JanusGraph_home/conf目錄下。

在JanusGraph_home/conf目錄下有一些JanusGraph的示例配置檔案。

下面是一個JanusGraph的示例配置檔案的內容，這個檔案中為JanusGraph指定了cassandra作為後端儲存引擎，並且指定了elasticsearch作為索引後端。

storage。backend=cassandra

storage。hostname=localhost

index。search。backend=elasticsearch

index。search。hostname=100。100。101。1， 100。100。101。2

index。search。elasticsearch。client-only=true

1。7 JanusGraph配置檔案的載入方法

JanusGraph的配置檔案如何載入呢？

（1）對於單獨安裝的JanusGraph，可以在Gremlin中使用JanusGraphFactory類的方法來載入配置檔案

graph = JanusGraphFactory。open（‘path/to/configuration。properties’）

（2）對於嵌入到應用中的JanusGraph來說，應用可以直接呼叫JanusGraph的公共API，只要在應用中呼叫JanusGraph的JanusGraphFactory就可以載入配置檔案了

（3）還可以在JanusGraphFactory中使用簡寫來載入配置。

graph = JanusGraphFactory。open（‘cassandra：localhost’）；

graph = JanusGraphFactory。open（‘berkeleyje：/tmp/graph’）；

1。8 JanusGraph分散式叢集的安裝方法

JanusGraph作為一個圖資料庫系統，其實還是比較複雜的，它的安裝可以是很簡單的單機安裝，也可以是很複雜的分散式安裝，最不可理解的是官網（janusgraph。org）上沒有專門介紹安裝的文件。這裡的安裝方法是從IBM Developer works搜尋來的，下面著重介紹單機安裝，分散式叢集的安裝較為複雜，目前還沒有時間做，以後在做吧，任務在即。

從JanusGraph的架構圖可以看出，Janus的安裝需要以下元件：

（1）外部儲存系統，上圖左下的方框，JP也集成了一個Cassandra，可以用於單一資料庫使用

（2）外部索引系統，上圖右下的方框，JP本身集成了一個ES，這個是可選的。

（3）啟動JanusGraph Server，上圖中部的方框，它是從Apache Tinkpop專案中的Gremlin Server來的。

（4）啟動Gremlin客戶端去連線JanusGraph Server，上圖中部方框中的小框Tinkpop API-Gremlin指的是Gremlin console這個客戶端是呼叫了Tinkpop API去訪問JanusGraph Server的。

1。9 JanusGraph的命令介面使用方法

如下的命令會建立一個圖，如下所示：

它有3個頂點，2個邊

3 vertex：

v1： label student property id： 1

v2： no label， no property

v3： label studentproperty id： 2

2 edges with label friends

graph = JanusGraphFactory。open（‘conf/janusgraph-cassandra。properties’）；

mgmt = graph。openManagement（）；

student = mgmt。makeVertexLabel（‘student’）。make（）；

friends = mgmt。makeEdgeLabel（‘friends’）。make（）；

mgmt。commit（）；

v1 = graph。addVertex（label， ‘student’）；

v1。property（‘id’， ‘1’）；

v2 = graph。addVertex（）；

v3 = graph。addVertex（label， ‘student’）；

v3。property（‘id’， ‘2’）；

graph。tx（）。commit（）；

v1。addEdge（‘friends’， v2）；

v1。addEdge（‘friends’， v3）；

graph。tx（）。commit（）；

graph。traversal（）。V（）；

graph。traversal（）。V（）。values（‘id’）；

graph。traversal（）。E（）；

2、部署JanusGraph

2。1 docker install

The following section gives a minimal introduction on how to use the JanusGraph Docker images。 For a more detailed documentation， refer to the README。md， especially for information about configuration of the images。 The source repository also contains example configuration and Docker Compose files。

Usage

Start a JanusGraph Server instance

The default configuration uses the Oracle Berkeley DB Java Edition storage backend and the Apache Lucene indexing backend

$ docker run --name janusgraph-default janusgraph/janusgraph:latest

Connecting with Gremlin Console

Start a JanusGraph container and connect to the janusgraph server remotely using Gremlin Console

$docker run --rm --link janusgraph-default:janusgraph -e GREMLIN_REMOTE_HOSTS=janusgraph \

-it janusgraph/janusgraph:latest ./bin/gremlin.sh

gremlin> ：remote connect tinkerpop。server conf/remote。yaml

==>Configured janusgraph/172。17。0。2：8182

gremlin> ：> g。addV（‘person’）。property（‘name’， ‘chris’）

==>v［4160］

gremlin> ：> g。V（）。values（‘name’）

==>chris

2。2 local install

下載地址：

https：//github。com/JanusGraph/janusgraph/releases

In order to run JanusGraph， Java 8 SE is required。 Make sure the $JAVA_HOME environment variable points to the correct location where either JRE or JDK is installed。 JanusGraph can be downloaded as a 。zip archive from the Releases section of the project repository。

The default configuration uses the Oracle Berkeley DB Java Edition storage backend and the Apache Lucene indexing backend

$ unzip janusgraph-0。5。2。zip

$ 。/bin/gremlin-server。sh start

。/bin/gremlin。sh

gremlin> ：remote connect tinkerpop。server conf/remote。yaml

==>Configured localhost/127。0。0。1：8182

遠端連線：

需要修改remote。yaml的地址

3、Gremlin

Query Language

Gremlin是janusgraph的查詢語言，用來獲取/變更圖資料。Gremlin是一個面向path的語言，能夠簡單快速的完成圖遍歷和變化操作。Gremlin是一個功能性語言，因此遍歷操作被宣告到類path的表示式表單。例如，from Hercules， traverse to his father and then his father’s father and return the grandfather’s name。

Gremlin是Apache TinkerPop的元件。它獨立於janusgraph發展，並且被支援於大多數圖資料庫。建在janusgraph上的應用程式透過Gremlin查詢語言，使用者避免被髮行商鎖在一個圖資料庫上。

這章是Gremlin查詢語言的簡要概述。更多資訊可以參考以下資源：

· Complete Gremlin Manual： Gremlin全部步驟的手冊。

· Gremlin Console Tutorial：學習如何用Gremlin Console高效圖遍歷，和互動式圖分析。

· Practical Gremlin Book：圖資料庫使用者和Gremlin查詢語言的起步教程。

· Gremlin Recipes： Gremlin的最佳實踐集合和常見的圖遍歷語法。

· Gremlin Language Drivers：用不同的語言連線到Gremlin server，如 Go， JavaScript，。NET/C#， PHP， Python， Ruby， Scala， and TypeScript。

· Gremlin Language Variants：學習如何內嵌Gremlin到一個主機程式語言。

· Gremlin for SQL developers：學習用SQL查詢資料，Gremlin的經典語法。

3。1 ：remote

：remote 命令告訴控制檯配置一個到服務端的遠端連線，該連結建立使用 conf/remote。yaml配置檔案。這個配置檔案指向了一個執行在localhost的Gremlin Server 例項。：>命令表示提交命令，它會把在那一行的Gremlin命令傳送到遠端伺服器。預設情況下，遠端連線是無會話的，這意味著在控制檯中傳送的每一行都被解釋為單個請求。使用分號作為分割符，可以在一行上傳送多個語句。或者，您可以在建立連線時透過指定session建立帶有會話的控制檯。 console session 允許您在多行輸入中重用變數。

gremlin> ：remote connect tinkerpop。server conf/remote。yaml

3。2 ：圖遍歷

gremlin> g。V（）。has（‘name’， ‘hercules’）。out（‘father’）。out（‘father’）。values（‘name’）==>saturn

上面的查詢可以分成以下步驟：

g：

當前

graph。

V：graph中所有頂點。

has（‘name’， ‘hercules’）：過濾出頂點有屬性name = “hercules” （這裡只有一個）。

out（‘father’）：從Hercules遍歷出邊（outgoing edge）為father的頂點。

out（‘father’）：從Hercules的father （Jupiter）遍歷出邊為father的頂點。

name：拿出頂點的name屬性的value。