Saving and loading graphs
The fastest way to ingest a graph is to load one from Raphtory's on-disk format using the load_from_file()
function on the graph. This does require first ingesting via one of the prior methods and saving the produced graph via save_to_file()
, but means for large datasets you do not need to parse the data every time you run a Raphtory script.
Info
This is similar to pickling and can make a drastic difference on ingestion, especially if your datasets require a lot of preprocessing.
In the example below we ingest the edge dataframe from the last section, save this graph and reload it into a second graph. These are both printed to show they contain the same data.
Warning
Due to the ongoing development of Raphtory, a saved graph is not guaranteed to be consistent across versions.
from raphtory import Graph
import pandas as pd
edges_df = pd.read_csv("data/network_traffic_edges.csv")
edges_df["timestamp"] = pd.to_datetime(edges_df["timestamp"]).astype(
"datetime64[ms, UTC]"
)
g = Graph()
g.load_edges_from_pandas(
edge_df=edges_df,
src_col="source",
dst_col="destination",
time_col="timestamp",
props=["data_size_MB"],
layer_in_df="transaction_type",
)
g.save_to_file("/tmp/saved_graph")
loaded_graph = Graph.load_from_file("/tmp/saved_graph")
print(g)
print(loaded_graph)
Output
Graph(number_of_edges=7, number_of_vertices=5, number_of_temporal_edges=7, earliest_time="1693555200000", latest_time="1693557000000")
Graph(number_of_edges=7, number_of_vertices=5, number_of_temporal_edges=7, earliest_time="1693555200000", latest_time="1693557000000")