Performing Analysis Of Meteorological Data

Abhishek Biswas
7 min readJun 14, 2021

Meteorological Data is important, as we can find so much of things like the tempeature variation throught out the past year, which is helpful to find the levels of global warming, and also finding the other parameter like the humidity, winds speed, pressure etc which is usefull to analysis of different thing which are releated to hydrology.

Dataset.

For the dataset we will downlord it from the kaggle which contains a huge collection of different type of dataset. Here is the link for downloading the meteorological data https://www.kaggle.com/muthuj7/weather-dataset

The dataset contains the hourly temperature recorded for last 10 years starting from 2006–04–01 00:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It corresponds to Finland, a country in the Northern Europe.

Terminology

Before doing the coding part, some of terms we have to know about which will helpful for us when we dealing and cleaning part of the dataset.

What is Apparent Temperature ?

Apparent Temperature is the temperature equivalent perceived by humans, caused by the combined effects of air, temperature, reletive humidity and wind speed. The measure is most commonly applied to the perceived outdoor temperature.

What is Humidity ?

Humidity is the amount of water vapor in the air. It is the ratio of actual vapor pressure to saturation vapor pressure. if the presence of water vapour is more them the Humidity is hiegh

What is Hypothesis and Hypothesis testing ?

A hypothesis is an educated guess about something in the world around you. It should be testable, either by experiment or observation. For example:

  • A new medicine you think might work.
  • A possible location of new species.

Hypothesis testing in statistics is a way for us to test the results of a survey or experiment to see if we have meaningful results. Here we are basically testing whether the results of the survey or the experiment are valid by figuring out the odds that our results have happened by chance. If your results may have happened by chance, the experiment won’t be repeatable and so has little use.

Steps for doing the hypothesis testing.

  1. Figure out the null hypothesis,
  2. State the null hypotheis.
  3. Choose what kind of test you need to perform,
  4. Either support or reject the null hypothesis.

Null hypotheis: It states that there is no relationship between two population parameters, i.e., an independent variable and a dependent variable.

Alternate hyposthesis: It is the inverse of a null hypothesis. An alternative hypothesis and a null hypothesis are mutually exclusive, which means that only one of the two hypotheses can be true.

Hypothesis.

Null Hypotheis

The Null Hypothesis H0 in this metrological data analysis is “Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming

The H0 means we need to find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not. This monthly analysis has to be done for all 12 months over the 10 year period. So we are basically resampling the data from hourly to monthly, then comparing the same month over the 10 year period.

Steps.

  1. Importing the necessary library and loading the datasets.
  2. Looking and same basic analysis at the dataset.
  3. Cleaning Dataset
  4. Plotting of Data

1. Importing the necessary library and loading the datasets.

Before starting anything, make-sure numpy, pandas, matplotlib, (sklearn) are installed on your computer. If they are not installed in the pc, it can be installed using pip installer in the terminal or command prompt. And the package is installed using the syntax.

pip install <package_name>

After installing the packages then we will import all the module, which we need for the analysis of the meterological dataset. After importing, we will load the dataset using the pandas read_csv() function which enable us to load the csv file in the python so that we can do the analysis.

Importing the necessary library and loading the datasets

The head() dunction enable us to see the first five rows of the datasets.

showing the first five rows of the dataframe

2. Looking and same basic analysis at the dataset.

After loading the dataset we will gather some of the basic info of our dataset that we will work on, like the Range index, column name, shape info. The pandas library has a function which is <dataframe_name>.info() which is used to get a concise summary of the dataframe. It comes really handy when doing exploratory analysis of the data.

dataframe.info() function return the short summary of the dataset.

The <dataframe>.columns function retuen the all name of the columns which are present on the dataset.

columns name present in the dataframe

Now after finding the column name now we will find presence of null value in the dataset, as if null value are present in the dataset it will hider the analysis and also give some faulty result. for checking the null value in the dataset we will use the isnull() function and isnull().sum() gives the sum of the null values in the perticular column in the dataframe. Here we can see that there are around 517 null values in the ‘precip column’ in the dataset.

checking the null values present in each column in the dataframe

3. Cleaning Dataset

In this step we will prepare our data for the plotting , we will first drop the unwanted columns (all except temperature and humidity). As we need to find the relataion between the Apparent temperature and humidity. so we have not remove these two rows. for droping the column we will use the drop() function which drop the column in the original dataframe. it doesnot return the copy of the dataframe after operation on the dataframe. After droping the columns the new dataframe form which contains the Apparent temperature and humidity and head function is use to see first few rows of the new dataframe. After that we also check the presence of the null values in the dataframe and it is done using the isnull().sum() function.

Removing the unwanted columns and also checking the null value presence in each columns

After checking the null value presence then we convert the Formatted Date column into Timezone to +00:00 UTC. and also set this column as index of the new dataframe which we form. and setting of index is done by using set_index() function present in the pandas library. And after that we will resample the data, which is done using the resample() function.

Resampling the data and also setting the ‘Formatted Date’ as index of the new dataframe

4. Plotting of Data

In this step we will plot a graph between Apparent temperature and the Humidity and this done by using the matplotlib library and we also ploting the graph between Apparent Tempearture and humidty on monthly basis

Graph between Apparent Temperature and Humidity

Here is the monthly basics analysis between Apparent Temperature and Humidity.

Graph between Apparent Temperature and Humidity (January)
Graph between Apparent Temperature and Humidity (Feb)
Graph between Apparent Temperature and Humidity (March)
Graph between Apparent Temperature and Humidity (April)
Graph between Apparent Temperature and Humidity (May)
Graph between Apparent Temperature and Humidity (June)
Graph between Apparent Temperature and Humidity (July)
Graph between Apparent Temperature and Humidity (Augest)
Graph between Apparent Temperature and Humidity (September)
Graph between Apparent Temperature and Humidity (October)
Graph between Apparent Temperature and Humidity (November)
Graph between Apparent Temperature and Humidity (December)

Resources / References

  1. https://numpy.org/doc/
  2. https://pandas.pydata.org/docs/
  3. https://matplotlib.org/stable/contents.html
  4. https://medium.com/analytics-vidhya/performing-analysis-of-meteorological-data-b63f3b125be8
  5. https://medium.com/swlh/performing-analysis-of-meteorological-data-using-python-8b862a9811bc
  6. https://www.statisticshowto.com/probability-and-statistics/hypothesis-testing/
  7. https://corporatefinanceinstitute.com/resources/knowledge/other/hypothesis-testing/#:~:text=The%20Null%20Hypothesis%20is%20usually,led%20to%20the%20alternative%20hypothesis.

Thank you

And I also gives thanks to the user for reading my article.

--

--