Skip to content

Exporting to Pandas dataframes

As we are ingesting from a set of dataframes, let's kick off by looking at how to convert back into them. For this Raphtory provides two functions to_vertex_df() and to_edge_df() as part of the export module. As the names suggest, these extract information about the vertices and edges respectively.

Vertex Dataframe

To explore the use of to_vertex_df() we first we call the function passing only our graph. As we don't set any flags this exports the full history, which can be seen within the printed dataframe. The property column for ServerA has been extracted and printed so this can be seen in full.

To demonstrate some of the flags which can be utilised, we call to_vertex_df() again, this time disabling the update and property history. You can see in the second set of prints that now only the most recent property values are present and the update_history column has been removed.

Graph

import raphtory.export as ex

df = ex.to_vertex_df(traffic_graph)
print("Dataframe with full history:")
print(f"{df}\n")
print("The properties of ServerA:")
print(f"{df[df['id'] == 'ServerA'].properties.iloc[0]}\n")

df = ex.to_vertex_df(
    traffic_graph, include_update_history=False, include_property_histories=False
)
print("Dataframe with no history:")
print(f"{df}\n")
print("The properties of ServerA:")
print(f"{df[df['id'] == 'ServerA'].properties.iloc[0]}\n")

Output

Dataframe with full history:
        id  ...                                     update_history
0  ServerA  ...      [1693555200000, 1693555500000, 1693556400000]
1  ServerB  ...  [1693555200000, 1693555500000, 1693555800000, ...
2  ServerC  ...  [1693555500000, 1693555800000, 1693556400000, ...
3  ServerD  ...      [1693555800000, 1693556100000, 1693557000000]
4  ServerE  ...      [1693556100000, 1693556400000, 1693556700000]

[5 rows x 3 columns]

The properties of ServerA:
{'_id': 'ServerA', 'server_name': 'Alpha', 'datasource': 'data/network_traffic_edges.csv', 'hardware_type': 'Blade Server', 'OS_version': [(1693555200000, 'Ubuntu 20.04')], 'uptime_days': [(1693555200000, 120)], 'primary_function': [(1693555200000, 'Database')]}

Dataframe with no history:
        id                                         properties
0  ServerA  {'_id': 'ServerA', 'server_name': 'Alpha', 'ha...
1  ServerB  {'uptime_days': 45, 'server_name': 'Beta', 'pr...
2  ServerC  {'primary_function': 'File Storage', '_id': 'S...
3  ServerD  {'uptime_days': 60, 'server_name': 'Delta', 'd...
4  ServerE  {'server_name': 'Echo', 'OS_version': 'Red Hat...

The properties of ServerA:
{'_id': 'ServerA', 'server_name': 'Alpha', 'hardware_type': 'Blade Server', 'uptime_days': 120, 'datasource': 'data/network_traffic_edges.csv', 'OS_version': 'Ubuntu 20.04', 'primary_function': 'Database'}

Edge Dataframe

Exporting to an edge dataframe via to_edge_df() works exactly the same as to_vertex_df(). By default this will export the full update history for each edge, split by edge layer. The flags to remove the history are once again available, as well as a flag to explode the edges and view each update individually.

In the below example we first create a subgraph of the monkey interactions, selecting some monkeys we are interested in (ANGELE and FELIPE). This isn't a required step, but is to demonstrate the export working on graph views.

We then call to_edge_df() on the subgraph, setting no flags. In the output you can see the update/property history for each interaction type (layer) between ANGELE and FELIPE.

Finally, we call to_edge_df() again, turning off the property history and exploding the edges. In the output you can see each interaction which occurred between ANGELE and FELIPE.

Info

We have further reduced the graph to only one layer (Grunting-Lipsmacking) to reduce the output size.

Graph

import raphtory.export as ex

subgraph = monkey_graph.subgraph(["ANGELE", "FELIPE"])
df = ex.to_edge_df(subgraph)
print("Interactions between Angele and Felipe:")
print(f"{df}\n")

grunting_graph = subgraph.layer("Grunting-Lipsmacking")
df = ex.to_edge_df(grunting_graph, explode_edges=True, include_property_histories=False)
print("Exploding the grunting-Lipsmacking layer")
print(df)

Output

Interactions between Angele and Felipe:
       src  ...                                     update_history
0   ANGELE  ...  [1560419400000, 1560419400000, 1560419460000, ...
1   ANGELE  ...  [1560422580000, 1560441780000, 1560441780000, ...
2   ANGELE  ...                                    [1560855660000]
3   ANGELE  ...      [1560526320000, 1560855660000, 1561042620000]
4   ANGELE  ...                                    [1562253540000]
5   ANGELE  ...                                    [1561720320000]
6   FELIPE  ...  [1560419460000, 1560419520000, 1560419580000, ...
7   FELIPE  ...                                    [1562321580000]
8   FELIPE  ...      [1560526320000, 1561972860000, 1562253540000]
9   FELIPE  ...                                    [1561110180000]
10  FELIPE  ...                                    [1562057520000]
11  FELIPE  ...      [1560526260000, 1562253540000, 1562321580000]
12  FELIPE  ...                                    [1560526320000]
13  FELIPE  ...                                    [1562253540000]
14  FELIPE  ...                     [1562057520000, 1562671200000]

[15 rows x 5 columns]

Exploding the grunting-Lipsmacking layer
      src     dst                 layer     properties  update_history
0  ANGELE  FELIPE  Grunting-Lipsmacking  {'Weight': 1}   1560526320000
1  ANGELE  FELIPE  Grunting-Lipsmacking  {'Weight': 1}   1560855660000
2  ANGELE  FELIPE  Grunting-Lipsmacking  {'Weight': 1}   1561042620000
3  FELIPE  ANGELE  Grunting-Lipsmacking  {'Weight': 1}   1560526320000
4  FELIPE  ANGELE  Grunting-Lipsmacking  {'Weight': 1}   1561972860000
5  FELIPE  ANGELE  Grunting-Lipsmacking  {'Weight': 1}   1562253540000