PROFIT PREDICTION USING MULTIPLE LINEAR REGRESSION METHODS PYTHON PROGRAMMING LANGUAGE AT PT. TRI ERDNOV REZEKI

PT. Tri Erdnov Rezeki is a national private company founded in 2016 and fully under the leadership and ownership of Indonesian entrepreneurs. PT. Tri Erdnov Rezeki is also engaged in Construction Development Planning and Repair and Maintenance of Steam Aircraft and Pressure Vessels, Supply of Technical Equipment, Industrial Work Equipment, Electrical Equipment, and Machinery, especially in the Palm Oil Plantation and Power Plant Sectors. In carrying out a project, the company does not only carry out technical design but also must carry out the economic design so that the company can determine the economic feasibility of a project. However, companies often experience cost calculation errors. Data science can be utilized to predict the value achieved in a period using previous data. Data science will analyze patterns related to data with other data to produce a reference or a formula that can be used as a value prediction in the future. So, the author will use the multiple linear regression method using the Python programming language, which functions to perform statistical analysis, namely predicting profits in a project. The analysis results in this study show that the profit variable is influenced by 99.7% by the variables material cost, labor cost, and utility cost. In comparison, other variables outside the study influence the other 0.3%. The material cost variable has the most significant influence, where the value is below 0 .05 compared to the variable labor cost and utility cost to the profit variable. The average percentage of Python prediction errors is 0.97%, where the average percentage of Python prediction errors is smaller than the average percentage of SPSS prediction errors which is 2.04%.

The utilization can be used to predict the value achieved in a period using previous data. Data science will analyze patterns related to data with other data to produce a reference or a formula that can be used as a value prediction in the future (Kumar et al., 2020). Simple linear regression analysis is used to develop an equation that shows the relationship between the independent variables and the dependent variable and to predict the value of the dependent variable if the independent variable increases or decreases. It can be done using Python programming (Tran et al., 2021). Data science predictions using the Python programming language are expected to solve problems at PT. Tri Erdnov Rezeki often experiences errors in predicting a project's benefits, which can also be a solution for PT. Tri Erdnov Rezeki for quick calculations of project profits. In this study, the authors will use the multiple linear regression method using the Python programming language to generate project profit equations based on the correlation between data to predict project profits that PT. Tri Erdnov Fortune will carry out.

Data Mining
Data mining is an analytical step towards discovering knowledge in databases or knowledge discovery in databases, abbreviated as KDD. Knowledge can be in the form of data patterns or relationships between valid data that were not previously known. Data mining combines several computer science disciplines defined as discovering new patterns from large data sets, including methods of artificial intelligence, machine learning, statistics, and database systems (Ferré, 2020). Data mining is intended to extract (take the essence) of knowledge from a set of data so that a structure that is understandable to humans is obtained and includes database and data management, data processing, model considerations and inference, interest measures, complexity considerations, post-processing of the structures found, visualization, and online updating (Winarti, 2023). Data mining can be used for predictive purposes (e.g., classification, regression, bias/anomaly detection, etc.), using some existing variables to predict the future value of other variables (Eziama, 2021). Predictive methods in data mining are regularities, patterns, and relationships in large data sets and should be known in advance. The predictive method concludes existing data to make predictions on further data. Classification, regression, and deviation are predictive methods techniques (Ghavami, 2019).

Python Programming Language
Since appearing 1991 in the public domain, this programming language has developed with the support of a community of users and developers, such as the Python Software Activity, internet newsgroup comp.lang.python, and other informal organizations. This programming language is becoming commonly used by engineers worldwide in making their software, and even some companies use Python as a commercial software maker (Lutz, 2010). Python is a programming language that is freeware or freeware in the truest sense of the word. There are no restrictions on copying or distributing it. Complete with source code, debugger, and profiler. The interface contained interface services, system functions, GUI (graphical user interface), and a database. Python can be used on several operating systems, such as most UNIX systems, PCs (DOS, Windows, OS/2), Macintosh, and others. In most Linux operating systems, this programming language is standardized to be included in the distribution package (Lutz, 2001). Python is a multipurpose interpretive programming language with a design philosophy focusing on code readability. Python is claimed to be a language that combines capabilities, capabilities, with very clear code syntax and is equipped with a large and comprehensive standard library functionality. Python is a general-purpose programming language developed to make source code easy to read (Martelli, 2023).

IMPLEMENTATION METHOD
The authors used a quantitative approach in this study because the research data was in numbers and used statistical analysis. In this study, the authors collected research data in the form of a project budget plan, where the research data consisted of 30 budget plans, then presented the results of implementing a profit prediction system using multiple linear regression based on the Python programming language, prepared libraries and data, determined x and y variables, preparing training data and test data, the correlation between data, regression stats models, coefficient of determination, percentage error of regression results, and comparison of analysis results.

RESULTS AND DISCUSSION Research Data
In this study, the authors collected research data in the form of project budget designs, where the research data consisted of 30 budget designs. The budget design data can be seen in Table 1 below. Based on Table 1 above, three independent variables are denoted as variable x, and one dependent variable or denoted as variable y.

System Implementation
In this study, the authors will present the results of implementing a profit prediction system using multiple linear regression based on Python programming. The stages of designing a profit prediction system using the Python programming language can be seen in the following elaboration.

Setting Up Libraries and Data
The first step that must be taken to design a profit prediction system with Python is first to prepare libraries and research data in csv format. The preparation of the library and research data can be seen in Figure 1 below.

Determine the X and Y Variables
At this study stage, the writer will determine the x and y variables, where the x variable is an independent variable that can affect the y variable. The y variable is a variable that is influenced by the y variable, so in designing a profit prediction system using Python, the writer must declare the variable x and y variables. The determination of the x and y variables can be seen in Figure 2 below.

Figure 2. Determination of X and Y Variables
Prepare training data and test data.
In this study, the authors will conduct training and test data on research data so that it can be analyzed by Python software. The data in the study are around 30 project budget designs. The preparation of training and test data can be seen in Figure 3 below.

Correlation Between Data
In this study, the writer will show the correlation between data processed by Python so that the writer can use variable results as a benchmark for profit prediction. The correlation between data can be seen in Figure 4 below.

Stats models Regression Results
In this study, the authors will calculate the results of multiple linear regression by entering the dependent variable into the equation. The results of the Python stats models regression can be seen in Figure 5 below.

Figure 5. Stats models Python Regression Results
The calculation of the results of multiple linear regression with three independent variables can be seen in the following equation. Information: Y = Profit X1 = Material_Cost X2 = Labor_Cost X3 = Utility_Cost Variable y is profit in a project planning, and there are constants or provisions, variable x1 is material cost, variable x2 is labor cost, and variable x3 is utility cost.

Multiple Linear Regression Validation With SPSS
In this study, the authors will validate the results of multiple linear regression with SPSS software. SPSS validation is a reference for the truth in the profit prediction system. The validation of multiple linear regression with SPSS can be seen in the following elaboration.

SPSS Multiple Linear Regression Results
In this study, hypothesis testing was carried out using multiple linear regression analysis. Multiple linear regression analysis is used to analyze the effect of several independent variables on the dependent variable. The results obtained can be seen in Table 2 below:  (0), the dependent variable profit is 154,103.504. 2. The regression coefficient for using social media is 0.197. It has a positive sign, which means that if the Material_Cost variable increases by 1 unit, the dependent variable, namely profit, will also increase by 0.197 and vice versa. 3. The Labor_Cost regression coefficient is 0.088 and is negative, which means that if the Labor_Cost variable decreases by 1 unit, the dependent variable, namely profit, will also decrease by 0.088 and vice versa. 4. The value of the Utility_Cost regression coefficient is 0.231. It is positive, which means that if the Utility_Cost variable increases by 1 unit, the dependent variable, namely profit, will also increase by 0.231 and vice versa.

SPSS Determination Coefficient Test
The Coefficient of Determination test measures the model's ability to explain variations in the dependent variable. If the value of R2 is small, it means that the ability of the independent variable to explain the variation in the dependent variable is very limited. The weakness of using R2 is the bias toward the number of independent variables included in the model. Therefore, it is recommended to use the Adjusted R2 value to evaluate which regression model is the best. The results of the test for the coefficient of determination can be seen in Table 3 below.

Table 3. Test Results for the Coefficient of Determination
The Adjusted R square (R2) is 0.999 based on the table above. This value indicates that the dependent variable Profit is affected by 99% by Utility_Cost, Material_Cost, and Labor_Cost. At the same time, the remaining 1% is influenced by other variables outside of this study.

Analysis of Multiple Linear Regression Results
In this study, the authors will analyze the results of multiple linear regression to predict profits using Python software and SPSS software. The analysis of multiple linear regression results using Python and SPSS can be seen in the following elaboration.

Regression Result Equation Test
The equation test of multiple linear regression results using the Python programming language and using SPSS can be seen in the following elaboration. In this study, the authors will use Python to summarize the results of multiple linear regression equation tests. The tabulation of the results of the multiple linear regression equation test using Python can be seen in Table 4 below. In this study, the authors will summarize the results of multiple linear regression equation tests using SPSS. The tabulation of the results of the multiple linear regression equation test using SPSS can be seen in Table 5 below.

Percentage of Regression Error Results
In this study, the authors will calculate the percentage error results of multiple linear regression, which compares the profit value of Python and SPSS with the actual profit value. The percentage of error can be calculated using the following equation.

Comparison of Analysis Results
In this study, the authors will compare the analysis results, including the profit value for each prediction method and the error percentage for each. The comparison of the results of the analysis can be seen in the following description.

Profit Value Comparison
In this study, the authors will compare the profit value or variable y in each profit prediction method that has been carried out. The profit value comparison data will be summarized in graphic format. The profit value comparison chart can be seen in Figure 6 below.

Error Percentage Comparison
In this study, the authors will compare the percentage of profit prediction errors in each method that has been carried out. The data for comparing the percentage of errors will be summarized in graphic format. The error percentage comparison graph can be seen in Figure 7 below.

CONCLUSION
The conclusions of this study can be seen in the following elaboration.
1) The research results show that the r square is worth 0.997. This value indicates that the profit variable is 99.7% influenced by the variable material cost, labor cost, and utility cost, while other variables outside this study influence the other 0.3%. Based on the results of the T-test that has been carried out using Python, it can be seen that the material cost variable has the most influence, namely where the value is below 0.05 compared to the variable labor cost and utility cost to the profit variable.
2) Based on the results of the analysis of the percentage of precision errors carried out by comparing SPSS and Python profits with actual profits of 30 data, it shows that the average percentage of Python prediction errors is worth 0.97%, where the average percentage of python prediction errors is smaller than the average SPSS prediction error percentage is 2.04%.