Data Scientist focused on finding patterns and relations between online consumers. She also co-organizes R-Ladies Madrid and is a member of the NASADatanauts.
Twitter / Github : @Chucheria
A graph G is a collection of entities (nodes) and the relationships (edges) that connect those entities
G = {N, E}
A graph database stores your data in a graph.
Tables and foreign keys are nodes and relationships.
Graphs can be undirected
You have to make directed relationships. Forget about them when you make your queries.
Nodes can have properties.
Relationships can have properties.
Node labels are the best. Gives us set advantages.
q <- ("MATCH (n:Movie { title: 'Footloose' })
RETURN properties(n) AS properties")
properties <- cypherToList(graph, q)
purrr::map(properties, ~names(.x$properties))
## [[1]]
## [1] "studio" "releaseDate" "imdbId" "description"
## [5] "runtime" "language" "title" "version"
## [9] "trailer" "imageUrl" "genre" "tagline"
## [13] "lastModified" "id" "homepage"
# Connector
library(RNeo4j)
# Manipulate data
library(dplyr)
# Work with graphs
library(igraph)
# Visualize
library(ggplot2)
library(ggthemes)
library(visNetwork)
## If you want to use the Grammar of graphics
# library(ggraph)
graph <- startGraph("http://localhost:7474/db/data/",
username="",
password="")
The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex. It represents the degree of which nodes stand between each other.
\[betweenness(v) = \sum_{x \neq y \in V} \frac{\sigma_{xy}(v)}{\sigma_{xy}}\]
## Larry Simms Arthur Lake Daisy Penny Singleton
## 129.01905 55.02381 64.12143 55.12143
## Danny Mummert
## 97.35476