Querying the graph over time
The first set of view functions we will look at are for traveling through time, viewing the graph as it was at a specific point, or between two points (applying a time window). For this Raphtory provides four functions: at()
, window()
, expand()
and rolling()
.
At
The at()
function takes a time
argument in epoch (integer) or datetime (string/datetime object) format and can be called on a graph, vertex, or edge. This will return an equivalent Graph View
, Vertex View
or Edge View
which includes all updates between the beginning of the graphs history and the provided time (inclusive of the time provided).
This returned object has all of the same functions as its unfiltered counterpart and will pass the view criteria onto any entities it returns. For example if you call at()
on a graph and then call vertex()
, this will return a Vertex View
filtered to the time passed to the graph.
An example of this can be seen in the code below where we print the degree of Lome
on the full dataset, at 9:07 on the 14th of June and at 12:17 on the 13th of June.
Info
You will below that graph.at().vertex()
and graph.vertex().at()
are synonymous.
We also introduce two new time functions here, start()
and end()
, which specify the time range a view is filtered to, if one has been applied. You can see in the last line of the example we print the start
, earliest_time
, latest_time
and end
of the vertex to show you how these differ.
v = g.vertex("LOME")
print(f"Across the full dataset {v.name()} interacted with {v.degree()} other monkeys.")
v_at = g.vertex("LOME").at("2019-06-14 9:07:31")
print(
f"Between {v_at.start_date_time()} and {v_at.end_date_time()}, {v_at.name()} interacted with {v_at.degree()} other monkeys."
)
v_at_2 = g.at(1560428239000).vertex("LOME") # 13/06/2019 12:17:19 as epoch
print(
f"Between {v_at_2.start_date_time()} and {v_at_2.end_date_time()}, {v_at_2.name()} interacted with {v_at_2.degree()} other monkeys."
)
print(
f"Window start: {v_at_2.start_date_time()}, First update: {v_at_2.earliest_date_time()}, Last update: {v_at_2.latest_date_time()}, Window End: {v_at_2.end_date_time()}"
)
Output
Across the full dataset LOME interacted with 18 other monkeys.
Between 2019-06-13 09:50:00 and 2019-06-14 09:07:31.001000, LOME interacted with 9 other monkeys.
Between 2019-06-13 09:50:00 and 2019-06-13 12:17:19.001000, LOME interacted with 5 other monkeys.
Window start: 2019-06-13 09:50:00, First update: 2019-06-13 09:52:00, Last update: 2019-06-13 11:01:00, Window End: 2019-06-13 12:17:19.001000
Window
The window()
function works the same as the at()
function, but allows you to set a start
time as well as an end
time (inclusive of start, exclusive of end).
This is useful for digging into specific ranges of the history that you are interested in, for example a given day within your data, filtering everything else outside this range. An example of this can be seen below where we look at the number of times Lome
interacts wth Nekke
within the full dataset and for one day between the 13th of June and the 14th of June.
Info
We use datetime objects in this example, but it would work exactly the same with string dates and epoch integers.
from datetime import datetime
start_day = datetime.strptime("2019-06-13", "%Y-%m-%d")
end_day = datetime.strptime("2019-06-14", "%Y-%m-%d")
e = g.edge("LOME", "NEKKE")
print(
f"Across the full dataset {e.src().name()} interacted with {e.dst().name()} {len(e.history())} times"
)
e = e.window(start_day, end_day)
print(
f"Between {v_at_2.start_date_time()} and {v_at_2.end_date_time()}, {e.src().name()} interacted with {e.dst().name()} {len(e.history())} times"
)
print(
f"Window start: {e.start_date_time()}, First update: {e.earliest_date_time()}, Last update: {e.latest_date_time()}, Window End: {e.end_date_time()}"
)
Output
Across the full dataset LOME interacted with NEKKE 41 times
Between 2019-06-13 09:50:00 and 2019-06-13 12:17:19.001000, LOME interacted with NEKKE 8 times
Window start: 2019-06-13 00:00:00, First update: 2019-06-13 10:18:00, Last update: 2019-06-13 15:05:00, Window End: 2019-06-14 00:00:00
Expanding
If you have data covering a large period of time, or have many time points of interest, it is quite likely you will find yourself calling at()
over and over. If there is a pattern to these calls, say you are interested in how your graph looks every morning for the last week, you can instead utilise expanding()
.
expanding()
will return an iterable of views as if you called at()
from the earliest time to the latest time at increments of a given step
.
The step can be given as a simple epoch integer, or a natural language string describing the interval. For the latter, this is converted it into a iterator of datetimes, handling all corner cases like varying month length and leap years.
Within the string you can reference years
, months
weeks
, days
, hours
, minutes
, seconds
and milliseconds
. These can be singular or plural and the string can include 'and', spaces, and commas to improve readability.
In the code below, we can see some examples of this where we first increment through the full history of the graph a week at a time. This creates four views, each of which we ask how many monkey interactions it has seen. You will notice the start time doesn't not change, but the end time increments by 7 days each view.
The second example shows the complexity of increments Raphtory can handle, stepping by 2 days, 3 hours, 12 minutes and 6 seconds
each time. We have additionally bounded this expand via a window between the 13th and 23rd of June to demonstrate how these views may be chained.
print(
f"The full range of time in the graph is {g.earliest_date_time()} to {g.latest_date_time()}\n"
)
for expanding_g in g.expanding("1 week"):
print(
f"From {expanding_g.start_date_time()} to {expanding_g.end_date_time()} there were {expanding_g.num_temporal_edges()} monkey interactions"
)
print()
start_day = datetime.strptime("2019-06-13", "%Y-%m-%d")
end_day = datetime.strptime("2019-06-23", "%Y-%m-%d")
for expanding_g in g.window(start_day, end_day).expanding(
"2 days, 3 hours, 12 minutes and 6 seconds"
):
print(
f"From {expanding_g.start_date_time()} to {expanding_g.end_date_time()} there were {expanding_g.num_temporal_edges()} monkey interactions"
)
Output
The full range of time in the graph is 2019-06-13 09:50:00 to 2019-07-10 11:05:00
From 2019-06-13 09:50:00 to 2019-06-20 09:50:00 there were 789 monkey interactions
From 2019-06-13 09:50:00 to 2019-06-27 09:50:00 there were 1724 monkey interactions
From 2019-06-13 09:50:00 to 2019-07-04 09:50:00 there were 2358 monkey interactions
From 2019-06-13 09:50:00 to 2019-07-11 09:50:00 there were 3196 monkey interactions
From 2019-06-13 00:00:00 to 2019-06-15 03:12:06 there were 377 monkey interactions
From 2019-06-13 00:00:00 to 2019-06-17 06:24:12 there were 377 monkey interactions
From 2019-06-13 00:00:00 to 2019-06-19 09:36:18 there were 691 monkey interactions
From 2019-06-13 00:00:00 to 2019-06-21 12:48:24 there were 1143 monkey interactions
From 2019-06-13 00:00:00 to 2019-06-23 16:00:30 there were 1164 monkey interactions
Rolling
Where at()
has expanding()
, window()
has rolling()
. This function will return an iterable of views, incrementing by a window
size and only including the history from inside the window period (Inclusive of start, exclusive of end). This allows you to easily extract daily or monthly metrics.
For example, below we take the code from expanding and swap out the function for rolling()
. In the first loop we can see both the start date and end date increase by seven days each time, and the number of monkey interactions sometimes decreases as older data is dropped from the window.
print("Rolling 1 week")
for expanding_g in g.rolling(window="1 week"):
print(
f"From {expanding_g.start_date_time()} to {expanding_g.end_date_time()} there were {expanding_g.num_temporal_edges()} monkey interactions"
)
Output
Rolling 1 week
From 2019-06-13 09:50:00 to 2019-06-20 09:50:00 there were 789 monkey interactions
From 2019-06-20 09:50:00 to 2019-06-27 09:50:00 there were 935 monkey interactions
From 2019-06-27 09:50:00 to 2019-07-04 09:50:00 there were 634 monkey interactions
From 2019-07-04 09:50:00 to 2019-07-11 09:50:00 there were 838 monkey interactions
Alongside the window size, rolling()
takes an option step
argument which specifies how far along the timeline it should increment before applying the next window. By default this is the same as window
, allowing all updates to be analysed exactly once in non-overlapping windows.
If, however, you would like to have overlapping or fully disconnected windows, you can set a step
smaller or greater than the given window
size. For example, in the code below we add a step
of two days. You can see in the output the start and end dates incrementing by two days each view, but are always seven days apart.
print("\nRolling 1 week, stepping 2 days (overlapping window)")
for expanding_g in g.rolling(window="1 week", step="2 days"):
print(
f"From {expanding_g.start_date_time()} to {expanding_g.end_date_time()} there were {expanding_g.num_temporal_edges()} monkey interactions"
)
Output
Rolling 1 week, stepping 2 days (overlapping window)
From 2019-06-08 09:50:00 to 2019-06-15 09:50:00 there were 377 monkey interactions
From 2019-06-10 09:50:00 to 2019-06-17 09:50:00 there were 387 monkey interactions
From 2019-06-12 09:50:00 to 2019-06-19 09:50:00 there were 698 monkey interactions
From 2019-06-14 09:50:00 to 2019-06-21 09:50:00 there were 711 monkey interactions
From 2019-06-16 09:50:00 to 2019-06-23 09:50:00 there were 787 monkey interactions
From 2019-06-18 09:50:00 to 2019-06-25 09:50:00 there were 797 monkey interactions
From 2019-06-20 09:50:00 to 2019-06-27 09:50:00 there were 935 monkey interactions
From 2019-06-22 09:50:00 to 2019-06-29 09:50:00 there were 735 monkey interactions
From 2019-06-24 09:50:00 to 2019-07-01 09:50:00 there were 794 monkey interactions
From 2019-06-26 09:50:00 to 2019-07-03 09:50:00 there were 603 monkey interactions
From 2019-06-28 09:50:00 to 2019-07-05 09:50:00 there were 747 monkey interactions
From 2019-06-30 09:50:00 to 2019-07-07 09:50:00 there were 820 monkey interactions
From 2019-07-02 09:50:00 to 2019-07-09 09:50:00 there were 860 monkey interactions
From 2019-07-04 09:50:00 to 2019-07-11 09:50:00 there were 838 monkey interactions
As a small example of how useful this can be, in the following segment we plot the daily unique interactions of Lome
via matplotlib
in only 10 lines!
Info
We have to recreate the graph in the first section of this code block so that the output can be rendered as part of the documentation. Please ignore this.
# mkdocs: render
###RECREATION OF THE GRAPH SO IT CAN BE RENDERED
import matplotlib.pyplot as plt
import pandas as pd
from raphtory import Graph
edges_df = pd.read_csv(
"data/OBS_data.txt", sep="\t", header=0, usecols=[0, 1, 2, 3, 4], parse_dates=[0]
)
edges_df["DateTime"] = pd.to_datetime(edges_df["DateTime"]).astype("datetime64[ms]")
edges_df.dropna(axis=0, inplace=True)
edges_df["Weight"] = edges_df["Category"].apply(
lambda c: 1 if (c == "Affiliative") else (-1 if (c == "Agonistic") else 0)
)
g = Graph.load_from_pandas(
edges_df=edges_df,
src="Actor",
dst="Recipient",
time="DateTime",
layer_in_df="Behavior",
props=["Weight"],
)
###ACTUAL IMPORT CODE
importance = []
time = []
for rolling_lome in g.vertex("LOME").rolling("1 day"):
importance.append(rolling_lome.degree())
time.append(rolling_lome.end_date_time())
plt.plot(time, importance, marker="o")
plt.xlabel("Date")
plt.xticks(rotation=45)
plt.ylabel("Daily Unique Interactions")
plt.title("Lome's daily interaction count")
plt.grid(True)