STARDOM is the SofTwARe Developer cOMpentency profiler that builds a developer’s competence model within a project. A profiler is a tool “which builds a profile as the structured representation of a user’s need through which a retrieval system should act upon”. STARDOM utilizes the information from structured sources, such as source control management (SCM) systems and issue tracking systems (ITS), and unstructured sources such as mailing lists and forums. It monitors changes to the Knowledge Base by subscribing to the Petals ESB in order to exchange information with the rest of the system.
In FLOSS communities the profile information of developers is different between sources of information. For this reason in STARDOM we have implemented an algorithm for identifying developers across information sources using four basic profile properties of a developer, the first name, the last name, the username and finally the e-mail address. We discovered that in order improve our matching algorithm some of the properties of the developers needed to be normalized, such as the e-mail address, in some consistent format across the information sources.
Furthermore we have created a framework that is responsible for handling the extraction of activity metrics from the information sources, through pluggable self-contained analyzers. An analyzer is a self-contained library responsible for the extraction of a single metrics.
##Overview of STARDOM
STARDOM is the component that is responsible for creating and maintaining a developer’s profile. More specifically, we focus on enriching the profile with information about the expertise of the developer. The developer’s expertise is modelled and stored in a competency model. In order to create a profile for a developer there is a three-step process in place. The first step of the process is to identify the developer across information sources, the second is to gather metrics from the developers activity in the community, and finally to represent this information in some form.
Figure 1 STARDOM Process Diagram
###Developer Identification
In STARDOM we based our approach on how to achieve cross-system identification for user adaptive systems. In the context of ALERT we have different information sources instead of separate systems, and as such a system in correlates with an information source. The properties that are considered for identification purposes in STARDOM are the username, first name, last name and e-mail address.
Given that the values of these properties might by insufficient, erroneous, or purposely provided with a non-valid value, 3 levels are considered for each of these properties that are used to calculate its priority for identification. The levels attached to each property are the Univocity level (UL), the Values per User (VpU) and the Misleading level (ML) [2]. UL represents the amount of times a property can have the same values across the different information sources. VpU is the possibility of a property to be provided with different values across information sources, and ML is the possibility of a property to be provided with a non-valid value.
For each property the UL, VpU and ML levels are weighted: Wu for UL, Wp for VpU and Wm for ML. When two profiles are encountered, an importance factor (IF) is calculated for each of the matching properties. The sum of the importance factors is then compared against a threshold Thd. If the combined importance factor is above the threshold then the 2 profiles are considered a match. The importance factor for a property p is calculated as follows: IF(p)= Value(UL(p))*Wu + Value(1-VpU(p))*Wp + Value (1-ML(p))*Wm (1) In cases where two or more properties of a developer match, then the combined IF is calculated as follows: IF(p,q)=IF(p)+(1-IF(p))*IF(q ) (2) where IF(p) and IF(q) are the IF of each property . The weights Wu, Wp, Wm and the threshold ThD are to be calculated on a per project basis using manual evaluation to determine the best setting possible. In order to better understand the whole process, lets assume that 2 profiles are about to be matched.
First Name | Fotis | Fotis |
Last Name | Paraskevopoulos | Paraskevopoulos |
Username | fotakis | fotisp |
fotisp@mail.ntua.gr | fotisp@mail.ntua.g |
Table 1 Identification Developer Example
In Table 1 we see that 3 properties match, the first name the last name and the e-mail address. The identification process will first calculate the IF of the each property using formula (1), the IF(First Name), then the second property IF(Last Name), and finally the IF(E-Mail). Since in this scenario we have more that one property, which matches thus multiple IFs, we use the formula (2) to combine each IF. So first we combine IF(First Name) with IF(Last Name), where IF(p) is IF(First Name) and IF(q) is IF(Last Name), resulting in IF(First Name, Last Name). The result is then further combined with the remaining IF(E-Mail), where IF(First Name,Last Name) is IF(p) and IF(E-Mail) is IF(q). After this process we have a combined IF(First Name, Last Name, E-Mail) which is compared to the threshold ThD to be considered a match.
###Developer Profile Construction
In STARDOM we propose a developer competency model, which combines different quantitative and qualitative measures taken from several information sources. We will use the information that has been extracted by WP2 and WP3 to calculate the values of different metrics that make up the competency model. The competency model created in STARDOM is made up of four main attributes, Fluency, Contribution, Effectiveness and Recency.
Each of the levels of these attributes is calculated using several weighted metrics. Figure 2 shows the metrics that are gathered along with a visual representation of the attribute that each is part of.
Figure 2 Competency Model
The metrics gathered for each developer are stored in an internal database, which is used to more efficiently calculate the competency index of the developer in real time.
####Fluency
Fluency states that a developer has general knowledge about the different aspects of the project, such as its developers, the different modules that make up the system, the internal tools and languages used for development, etc. Fluency results from the calculation of the following metrics: ·The number of API usage counts. ·The number of API introduced. ·Static Analysis of the Source code
####Contribution
Contribution is the level of work the developer does for a specific project, in terms of source code contributions and patches committed. The activity of the developer in mailing lists, forums and ITS is also considered as contributions to the project, and taken into consideration. Contribution results from the calculation of the following metrics: ·The Mailing List Activity. ·The ITS Activity (In terms of comments). ·The number of Lines of Code (LOC) introduced.
####Effectiveness
Effectiveness is a measurement of the effect of the developer’s contributions whether positive or negative in the project. Effectiveness results from the calculation of the following metrics: ·The number of Issues fixed in the ITS. ·The number of commits that have caused the resolution of an Issue. ·The number of commits that have introduced an issue. oThis metric has an inverse effect on metric 2.
####Recency
Recency is a measurement based on the time from the last activity that has been recorded in the Knowledge Base for a developer. The value of this attribute is calculated at query-time, where a comparison is made between the current timestamp and those recorded in the metrics that make up the Recency of the developer. The metrics that affect the Recency of the developer are: ·The time of the last mailing list contribution. ·The time of the last ITS action related to this developer, it can be a comment, an assigned issue or a resolved issue. ·The time of the last commit action in the SCM.
####Developer Representation
STARDOM will model the competency of the developer in a Competency Ontology. A graphical representation can be seen in Figure 3.
Figure 3 Competency Ontology