An example analysis framework#

This notebook provide an example of the analysis performed by the PLATO MSAP4-03 submodules: results from Lomb-Scargle periodogram, autocorrelation functions and composite spectrum are used to produce a set of features exploited by an existing instance of ROOSTER to return the final rotation period of the analysed target. You will find that what is done here is very similar to the ROOSTER tutorial notebook (rooster_training_framework, you should run it before doing this tutorial), the only big difference is actually that we will use here a pre-trained ROOSTER instance !

Here, the MSAP4-01A and MSAP4-02 steps required to feed MSAP4-03 are included in order to be able to run this notebook independently from the MSAP4-01A and MSAP4-02 notebooks.

Note that, due its significant computing time, the MSAP4-01B component dedicated to background analysis is executed independently in another notebook.

Note: This notebook has been designed for the purpose of scientific justification of MSAP4-03. The notebook illustrated the precise flowchart envisaged for MSAP4-03 is cs_rooster_sph_analysis.ipynb

import star_privateer as sp
sp.__version__
'1.1.1'

A simple example#

import os, pathos
from tqdm import tqdm
import numpy as np
import matplotlib.pyplot as plt

if not os.path.exists ('stellar_analysis_features') :
    os.mkdir ('stellar_analysis_features')

Our working case is KIC 3733735, a well-known Kepler fast rotating star.

filename = sp.get_target_filename (sp.timeseries, '003733735')
t, s, dt = sp.load_resource (filename)

The first thing we have to do is run the analysis pipeline. In particular, we can take a look at the plots made from the different analysis methods.

(p_ps, p_acf, ps, acf,
 cs, features, feature_names, _) = sp.analysis_pipeline (t, s, pmin=0.1, pmax=60, figsize=(8,10),
                                                            wavelet_analysis=False, plot=True,
                                                            filename='stellar_analysis_features/003733735.png',
                                                        )
../../_images/stellar_analysis_framework_8_0.png

We then save the results to a csv file:

fileout = 'stellar_analysis_features/003733735.csv'
df = sp.save_features (fileout, 3733735, features, feature_names)

As in the previous tutorial, let’s build a feature catalog. This is actually not required here because we are analysing only one star, but this step allows to ROOSTER-analyse several stars together with a simple framework.

df = sp.build_catalog_features ('stellar_analysis_features')

Then, let’s load the ROOSTER instance that we have trained in the previous tutorial:

chicken = sp.load_rooster_instance (filename='rooster_instances/rooster_tutorial')

As previously, let’s split the DataFrame into ROOSTER required inputs:

(target_id, p_candidates,
 e_p_candidates, E_p_candidates,
 features, feature_names) = sp.create_rooster_feature_inputs (df, return_err=True)

Here, we can see that there is actually (almost) nothing to do, as the three methods have yielded the same \(P_\mathrm{rot}\) estimate. However, we need ROOSTER to provide us with the rotation score of the target. ROOSTER will also select one of the three p_candidates as the final estimate for our target.

p_candidates
array([[2.53459872, 2.63594132, 2.56168138]])

The analyseSet function implemented in ROOSTER allows to analyse the features we extracted with the analysis pipeline. By providing feature_names, we ensure that ROOSTER was trained with the same features that those we extracted.

rotation_score, prot, e_p, E_p = chicken.analyseSet (features, p_candidates, e_p_err=e_p_candidates,
                                                     E_p_err=E_p_candidates, feature_names=feature_names)

We finally get the rotation score and the final \(P_\mathrm{rot}\). A rotation score above 0.5 means that the ROOSTER analysis favours a detection of stellar surface rotation signal.

rotation_score, prot, e_p, E_p
(array([0.78]), array([2.53459872]), array([0.21810524]), array([0.2634447]))

Analysing a PLATO simulated light curves dataset#

In order to illustrate the pipeline features described above, we can apply the pipeline to a larger dataset of 255 PLATO simulated light curves in order to check what we recover.

import plato_msap4_demonstrator_datasets.plato_sim_dataset as plato_sim_dataset

if not os.path.exists ('plato_sim_features') :
    os.mkdir ('plato_sim_features')
list_id = sp.get_list_targets (plato_sim_dataset)

Just as in the ROOSTER tutorial, we define a dedicated wrapper to analyse several light curves in parallel and avoid any memory leakage. Note than, we apply a 60-day high-pass finite impulse response filter to the simulated light curves (preprocess) function in order to remove low-frequency systematics while preserving at most the signature of stellar surface rotation in the data.

def analysis_wrapper (star_id) :
    """
    Analysis wrapper to speed computation
    by parallelising process and control
    memory usage.
    """
    star_id = str (star_id).zfill (3)
    fileout = 'plato_sim_features/{}.csv'.format(star_id)
    fileplot = 'plato_sim_features/{}.png'.format(star_id)
    filename = sp.get_target_filename (plato_sim_dataset, star_id, filetype='csv')
    if not os.path.exists (fileout) :
        t, s, dt = sp.load_resource (filename)
        s = sp.preprocess (t, s, cut=60)
        (p_ps, p_acf,
         ps, acf,
         cs, features,
         feature_names, fig) = sp.analysis_pipeline (t, s, pmin=0.1, pmax=60, plot=True,
                                                     wavelet_analysis=False, filename=fileplot,
                                                     figsize=(10,16), lw=1, dpi=300)
        df = sp.save_features (fileout, star_id, features, feature_names)
        plt.close ("all")

Let’s run the process pool:

process_pool = pathos.pools._ProcessPool (processes=4,
                                          maxtasksperchild=10)
with process_pool as p :
    list (tqdm (p.imap (analysis_wrapper,
                        list_id,
                        ),
                total=len (list_id))
          )
    p.close ()
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:00<00:00, 843.21it/s]

We can now analyse the obtained features with ROOSTER to provide our final results.

df = sp.build_catalog_features ('plato_sim_features')
df
prot_ps prot_acf prot_cs e_prot_ps E_prot_ps e_prot_acf E_prot_acf e_prot_cs E_prot_cs sph_ps sph_acf sph_cs h_ps fa_prob_ps hacf gacf hcs
target_id
0 42.955113 36.201157 43.493698 5.139282 6.755867 -1.0 -1.0 1.911796 1.911796 809.302045 770.019836 804.306555 4.573669e+06 0.0 0.522076 0.174793 0.804396
1 33.388871 32.603958 33.124768 7.850812 14.820267 -1.0 -1.0 5.027412 5.027412 197.317073 196.573468 196.860684 5.452245e+03 0.0 0.095639 0.045796 0.527596
2 17.529157 18.062384 17.308610 1.997207 2.586628 -1.0 -1.0 0.424356 0.424356 131.463440 132.741191 131.613358 9.626133e+03 0.0 0.226074 0.162293 0.137714
3 21.247463 20.791534 20.918945 2.308992 2.950196 -1.0 -1.0 0.729080 0.729080 108.132491 108.613715 108.524835 1.468874e+04 0.0 0.316458 0.308489 0.075097
4 28.636742 28.770649 29.353447 3.606657 4.821026 -1.0 -1.0 1.331297 1.331297 156.739595 156.556259 156.091130 8.556843e+02 0.0 0.021029 0.021854 0.433539
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
250 31.871195 29.319257 31.239800 9.996470 26.822076 -1.0 -1.0 4.055894 4.055894 213.718124 204.408037 215.372660 4.368693e+04 0.0 0.832173 0.428776 0.880686
251 20.214171 19.027656 20.212371 1.917448 2.366382 -1.0 -1.0 1.729649 1.729649 187.913408 178.586179 187.910206 8.072857e+04 0.0 0.688937 0.313818 0.835033
252 36.903489 37.805314 36.899833 5.608447 8.057559 -1.0 -1.0 5.558169 5.558169 1488.425513 1505.598715 1488.392507 8.011975e+06 0.0 1.508467 0.751838 0.976312
253 17.182045 17.395722 17.517652 2.098990 2.777632 -1.0 -1.0 2.245630 2.245630 164.770235 164.282166 164.016051 1.954305e+04 0.0 0.613475 0.262470 0.091111
254 18.950440 18.659603 18.951406 1.989637 2.518474 -1.0 -1.0 0.592259 0.592259 1210.035639 1216.557654 1209.993163 7.215866e+06 0.0 0.813409 0.298095 0.997597

255 rows × 17 columns

(target_id, p_candidates,
 e_p_candidates, E_p_candidates,
 features, feature_names) = sp.create_rooster_feature_inputs (df, return_err=True)
rotation_score, prot, e_p, E_p = chicken.analyseSet (features, p_candidates, e_p_err=e_p_candidates,
                                                     E_p_err=E_p_candidates, feature_names=feature_names)

Next, let’s load the reference catalog for these simulated light curves in order to compare the results from our pipeline with what was injected in the data.

prot_ref = sp.get_prot_ref (target_id, catalog='plato-sim')
cond_0 = (rotation_score>0.5)
cond_1 = (np.abs (prot - prot_ref) < 0.1 * prot_ref)
cond_2 = (np.abs (prot - prot_ref) < 0.1 * prot_ref) & (rotation_score>0.5)
score_0 = target_id[cond_0].size / target_id.size
score_1 = target_id[cond_1].size / target_id.size
score_2 = target_id[cond_2].size / target_id.size
score_0, score_1, score_2
(0.9450980392156862, 0.615686274509804, 0.596078431372549)

The score computed here means that we were able to successfully detect a rotation signal and recover the correct rotation period for about 61% of the stars in the sample. We can take a look at histograms to check the rotation score of our population and to compare the input rotation periods distribution to the one we recover.

fig, (ax1, ax2) = plt.subplots (1, 2, figsize=(10, 4))

bins = np.linspace (0, 1, 20, endpoint=False)
ax1.hist (rotation_score, bins=bins, color='darkorange')
ax1.axvline (0.5, ls='--', color='blue', lw=2)
bins = np.linspace (0, 80, 20, endpoint=False)
ax2.hist (prot, bins=bins, color='darkorange')
ax2.hist (prot_ref, bins=bins, facecolor='none',
         edgecolor='black', label='Ref')

ax1.set_ylabel (r'Number of stars')
ax1.set_xlabel (r'Rotation score')
ax2.set_xlabel (r'$P_\mathrm{rot}$ (day)')

ax1.set_xlim (0, 1)
ax2.set_xlim (0, 80)
(0.0, 80.0)
../../_images/stellar_analysis_framework_36_1.png